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<D , Abstract 

This paper shows new tight finite-blocklength bounds for the best achievable lossy joint source- 
channel code rate, and demonstrates that joint source-channel code design brings considerable per- 
formance advantage over a separate one in the non-asymptotic regime. A joint source-channel code 
Q . maps a block of k source symbols onto a length— n channel codeword, and the fidelity of reproduction 

at the receiver end is measured by the probability e that the distortion exceeds a given threshold d. 
J> " For memoryless sources and channels, it is demonstrated that the parameters of the best joint source- 

channel code must satisfy nC — kR(d) m nV + kV(d)Q 1 (e), where C and V are the channel 
capacity and channel dispersion, respectively; R(d) and V(d) are the source rate-distortion and rate- 
Q\ • dispersion functions; and Q is the standard Gaussian complementary cdf. Symbol-by-symbol (uncoded) 

1 transmission is known to achieve the Shannon limit when the source and channel satisfy a certain 

probabilistic matching condition. In this paper we show that even when this condition is not satisfied, 
symbol-by-symbol transmission is, in some cases, the best known strategy in the non-asymptotic regime. 
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I. Introduction 

For a large class of sources and channels, in the limit of large blocklength, the maximum 
achievable joint source-channel coding (JSCC) rate compatible with vanishing excess distortion 
probability is characterized by the ratio -^y till, attainable by separate source-channel coding 
(SSCC). However, at finite blocklengths not only is the fundamental limit no longer achiev- 
able but separation ceases to be optimal. Quantifying, as a function of blocklength and excess 
distortion probability, the required backoff from -^y as well as the increase in the rate of source 
symbols per channel use afforded by an optimal joint design bears great practical, as well as 
conceptual, interest. 

Prior research in this direction includes the work of Csiszar [J2j , who demonstrated that 
the error exponent of joint source-channel coding outperforms that of separate source-channel 
coding. For discrete source-channel pairs with average distortion criterion, Pile's achievability 
bound flU, [|51 applies. For the transmission of a Gaussian source over a discrete channel 
under the average mean square error constraint, Wyner's achievability bound 0, applies. 
Nonasymptotic achievability and converse bounds for a graph-theoretic model of JSCC have 
been obtained by Csiszar [8J. Most recently, Tauste Campo et al. [9| showed a number of finite- 
blocklength random-coding bounds applicable to the almost-lossless JSCC setup, while Wang et 
al. iflOl found the dispersion of JSCC for sources and channels with finite alphabets. 

In this paper we give a non- asymptotic analysis of joint source-channel coding including 
several achievability and converse bounds, which hold in wide generality and are tight enough 
to determine the dispersion of joint source-channel coding for the transmission of an abstract 
memoryless source over either discrete memoryless channel (DMC) or a Gaussian channel, under 
an arbitrary fidelity criterion. We also investigate the penalty incurred by separate source-channel 
coding using both the source-channel dispersion and the particularization of our new bounds to 

(i) the binary source and the binary symmetric channel with bit error rate fidelity criterion and 

(ii) the Gaussian source and Gaussian channel under mean-square error distortion. 

Further, we revisit the dilemma of whether one should or should not code when operating 
under delay constraints. Gastpar et al. [11] gave a set of necessary and sufficient conditions 
on the source, its distortion measure, the channel and its cost function in order for symbol- 
by-symbol transmission to attain the minimum average distortion. In these curious cases, the 
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source and the channel are probabilistically matched. We show that whenever the source and the 
channel are probabilistically matched so that symbol-by-symbol coding achieves the minimum 
average distortion, it also achieves the dispersion of joint source-channel coding. Moreover, 
even in the absence of such a match between the source and the channel, symbol-by- symbol 
transmission, though asymptotically suboptimal, might outperform in the non- asymptotic regime 
not only separate source-channel coding but also our random-coding achievability bound. 

The non-asymptotic theoretical limit of interest in this paper is the maximum number of 
source symbols per channel input transmissible at a given channel blocklength under the fidelity 
constraint of exceeding a given distortion level, regardless of decoding complexity. The excess 
distortion constraint is, in a way, more fundamental than the average distortion constraint which 
is the figure of merit in H|-[|7]|, because it gives full information about the distribution (and not 
just its mean) of the distortion incurred at the decoder output. 

The rest of the paper is organized as follows. Section HI summarizes basic definitions and 
notation. Sections [III] and [IV] introduce the new converse and achievability bounds to the maxi- 
mum achievable coding rate, respectively. A Gaussian approximation analysis of the new bounds 
is presented in Section |Vj The evaluation of the bounds and the approximation is performed 
for two important special cases: the transmission of a binary memoryless source (BMS) over a 
binary symmetric channel (BSC) with bit error rate distortion (Section IVIl) and the transmission 
of a Gaussian memoryless source (GMS) with mean-square error distortion over an AWGN 
channel with a total power constraint (Section [VIII) . Section [Villi focuses on symbol-by- symbol 
transmission. 

II. Definitions 

A lossy source-channel code is a pair of (possibly randomized) mappings f : Ai X and 
g: y i— > Ai. A distortion measure d : Ai x Ai i— > [0, +00) is used to quantify the performance 
of the lossy code. A cost function c: X \-± [0, +00) may be imposed on the channel inputs. 

Definition 1. A (d, e, a) code for {Ai, X, y, Ai, Ps,d, Py\x, c} is a source-channel code with 
P [d (S, g(Y)) > d] < e and either E [c(X)] < a (average cost constraint) or sup^^ c(x) < a 
(maximal cost constraint), where f(S) = X (see Fig. [7]) 

If there is no cost constraint (c(x) = for all x G X), we say a \d,e) code' instead of a 
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Fig. 1. A (d, e) joint source-channel code. 



\d, e, a) code'. 

The special case d = and d(s,,z) = 1 {s ^ z} corresponds to almost-lossless compression. 
If, in addition, P s is equiprobable on an alphabet of cardinality | X\ = \y\ = M, a (0, e, a) code 
in Definition \T\ corresponds to an (M, e, a) channel code (i.e. a code with M codewords and 
average error probability e and cost a). On the other hand, if Py\x is an identity mapping on 
an alphabet of cardinality M without cost constraints, a (d, e) code in Definition Q] corresponds 
to an (M, d, e) lossy compression code (as e.g. defined in |fT2lQ. 

Definition 2. In the conventional fixed-to-fixed (or block) setting in which X and y are the 
n—fold Cartesian products of alphabets A and B, M. and M. are the k—fold Cartesian products 
of alphabets S and <S, and d^: S k x <S k !->■ [0, +oo), c„: A n h-> [0, +oo), a (d,e,a) code for 
{S k , A n , B n , S k , P S k, dfc, Pya\x^-, c n } is called a (k,n,d,e,a) code (a (k,n,d,e) code if 
there is no cost constraint). 

Definition 3. Fix e, d, a and the channel blocklength n. The maximum achievable source 
blocklength and coding rate (source symbols per channel use) are defined by, respectively 

k*(n, d, e, a) = max {k : 3(fc, n, d, e, a) code} (1) 

R(n, d, e, a) = (2) 

n 

Alternatively, fix e, a, source blocklength k and channel blocklength n. The minimum achievable 
excess distortion is defined by 

D(k, n, e, a) = inf {d: 3(k, n, d, e, a) code} (3) 
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Denote, for a given Py\x, 



{a) = sup /(X; Y) (4) 

Px: 
E[c(X)]<a 



and, for a given Ps and the distortion measure d : M. x jCi i— > [0, +oo) 



R s (d) = inf 7(5; Z) (5) 

E[d(S,0)]<d 

We impose the following basic restrictions on Py\x, Ps and the distortion measure 

(a) Rs(c£) is finite for some d, i.e. d niin < oo, where 

d min = inf {d: R s (d) < °o} (6) 

(b) The infimum in © is achieved by a unique Pz*\s- 

(c) The supremum in ([4]) is achieved by a unique Px*. 

The dispersion, which serves to quantify the penalty on the rate of the best JSCC code induced 
by the finite blocklength, is defined as follows. 

Definition 4. Fix a and d > d m m- The rate-dispersion function of joint source-channel coding 
(source samples squared per channel use) is defined as 

— R(n, d, e, a)\ 
— (7) 

Q ( e ) J 

where C (a) and R(d) are the channel capacity-cost and source rate-distortion functions, re- 
spectively^ 

The distortion-dispersion function of joint source-channel coding is defined as 



'D^)-D(nR,n,e, 




W(R,a) = lim lim sup n — (8) 

e->o I Q 1 (e) 

where £)(•) is the distortion-rate function of the source. 

If there is no cost constraint, we will simplify notation and drop a from (OQ), ©, ©, ©, © 
and ©. 

'While for memoryless sources and channels, C(a) = C(a) and R(d) = Rs(d) given by 10 and l[5) evaluated with 
single-letter distributions, it is important to distinguish between the operational definitions and the extremal mutual information 
quantities, since the core results in this paper allow for memory. 
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Definition 5 (d— tilted information lfT2l ). For d > d min , the d— tilted information in s is defined 

d) = bg E[exp{AM-A*d( a ,Z*)}] (9) 
where the expectation is with respect to the unconditional distribution of Z*, and 

A* = -R' s (d) (10) 
The following properties of d— tilted information, proven in [fT3ll . are used in the sequel. 

3s(s,d) = i s , z *{s- z) + X*d(s,z) - \*d (11) 

E[j s (s,d)]=R s (d) (12) 

E [exp {\*d - A*d(S, z) + j s (S, d)}] < 1 (13) 
where (TTTI) holds for P|-almost every z, while ([T3l holds for all z E Ai, and 

/ >. , dP z \s= s / \ AS 
ts-,z(s; z) = log ' (z) (14) 
ar z 

denotes the information density of the joint distribution P sz at (s,z). We can define the right 
side of (fT4l) for a given [Pz\Si Pz) even if there is no P s such that the marginal of PsPz\s is 
Pz- We use the same notation is-z for that more general function. To extend Definition [5] to the 
lossless case, for discrete random variables we define 0-tilted information as 

Js (s,0) =i s (s) (15) 

where 

is the information in outcome s E M. 

Finally, the distortion <i-ball around s E M. is denoted by 

B d (s) = {z E M: d(s,z) <d} (17) 

2 A11 log's and exp's are in an arbitrary common base. 
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So as not to clutter notation, in Sections [TIT] and [IV] we assume that there are no cost constraints. 
However, all results in those sections generalize to the case of a maximal cost constraint by 
considering X whose distribution is supported in the subset of allowable channel inputs: 

F(a) = { x G X: c(x) < a} (18) 

rather than the entire channel input alphabet X. 

III. Converses 

A. Converses via d-tilted information 

Our first result is a general converse bound. 

Theorem 1 (Converse). The existence of a (d, e) code for S and Py\x requires that 

e> inf sup(supP[j 5 (5,d)-^ ; y(X;r)> 7 ] -exp(- 7 )l (19) 

P X\S 7 >0 I Py J 



> sup < sup E 

7>0 I P Y 



MF[ Js (S,d)-t X]Y (x;Y)> 1 \S] 



- exp (-7) } (20) 



where in (1191) . S — X — Y, and the conditional probability in (|20l) is with respect to Y distributed 
according to P Y \x=x (independent of S), and 

ix-,y(x; y) = log dP ^p~ x (v) ( 21 ) 

Proof: Fix 7 and the (d, e, a) code (Px\s, Pz\x)- Fix an arbitrary probability measure P Y on 
y. Let Py -> P z \y ->■ iz, i-e. P^(^) = E^y ^VOl2/)^y(2/)- S We can write the probability 

3 We write summations over alphabets for simplicity. Unless stated otherwise, all our results hold for abstract probability 
spaces. 
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in the right side of (fT9~l) as 

F[j s (S,d)-i x . Y (X;Y)> 1 ] 
= P [j s (S, d) - i x . tY (X- Y) > 7, d(S; Z) > d] 
+ P [ Js (S, d) - i x . Y (X; Y) > 7, d(S; Z) < d] 
< e +^P 5 (,)^P x|5 (x| S )^ P AY^\V) 

s£M xeX yey zeB d (s) 

■ P Ylx (y\x)l{Py lx (y\x)<P Y (y)exp(j s (s,d)-j)} 
< e + exp 



(22) 



= e + exp 
= e + exp 

< e + exp 

< e + exp 



7 )J2Ps(s)exp( Js (s } d))Y,PY(y) E Pz\Y(z\y)J2 P x\s(x\s) 

yey z&B d (s) xex 



seM 



s&M yey zeB d (s) 

-7) ^ Ps(s) exp (j 5 (s, d)) Pg(S d (s)) 
seM 

-7) £ E P ^ ex P rf ) + X * d ~ A * d (*, *)) 



zeM 



seM 



-7) 



(23) 
(24) 

(25) 

(26) 
(27) 

(28) 



where (|28l) is due to (fT3l) . Optimizing over 7 > and Pp to obtain the best possible bound for 
a given encoder P X \s- To obtain a code-independent converse, we simply optimize over P x \s, 
and (fl9l) follows. To show (1201) . we weaken (fl9l) as 

e > sup ( sup inf P [j s (S, d) - i x . Y {X; Y) > 7] - exp (-7) } (29) 

7 >0 I Py P X|S J 

and observe that for any Pp, 

MF[ JS (S,d)-i x . Y (X;Y)>7] 

F x\s 

y) > 7} 



s£A4 



ice* 



ye? 



P ^(s) mf ^Py| X (?/|x)l {j s (s,d) -t x . Y (x;y) > 7} 



s£A4 



E 



MF[ Js (S,d)-t XY (x;Y)> 1 \S] 



(30) 
(3D 
(32) 
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An immediate corollary to Theorem [JJ is the following result. 

Theorem 2 (Converse). Assume that there exists a distribution P Y such that the distribution of 
i x . Y (x;Y) (according to P Y \x= x ) does not depend on the choice of x e X. If a (d,e) code for 
S and Py\x exists, then 

e > sup ( P [ 3S {S, d) - i x .y(x- Y) > 7] - exp (-7) I (33) 

for an arbitrary 16^. The probability measure P in (1331 is generated by PsPy\x=x- 

Proof: Observe that under the assumption the conditional probability in the right side of 
(|2Q|) is the same regardless of the choice of x E X. ■ 

Remark 1. Theorems [JJ and [2] still hold in the case d — and d(x, y) = 1 {x 7^ y} that corresponds 
to almost-lossless data compression. Indeed, recalling (TT3T ), it is easy to see that the proof of 
Theorem [JJ applies, skipping the now unnecessary step (l22l) . and, therefore, (fl9l) reduces to 

e > inf sup ( sup P [i s (S) - i x . Y (X; Y) > 7] - exp (-7) 1 (34) 

P X\S 7 >0 [ Py J 

Remark 2. Our converse for lossy source coding in lfT2l Theorem 7] can be viewed as a particular 
case of the result in Theorem [21 Indeed, if X = y — {1,...,M} and PY\x{ m \ m ) = h 
P Y (l) = . . . = P y (M) = jj, then d33]) becomes 

e > sup P [j s {S, d) > log M + 7] - exp (-7) (35) 

7>0 

which is precisely lfT2l Theorem 7]. 

The next result generalizes Theorem [TJ When we apply Theorem [3] in Section |V] to find the 
dispersion of JSCC, we will let T be the number of channel input types. If T = 1, Theorem [3] 
reduces to Theorem [TJ 

Theorem 3 (Converse). Fix a positive integer T and a partition {X t , t = 1, . . . , T} of X. Define 
T: X 1— > [1, . . . ,T], T(x) = t if x G X t . The existence of a (d,e) code for S and Py\x requires 
that 

e> infsupj sup P \j s (S, d) - i X ;Y T(x) {X; Y) > 7 1 - Texp (-7) ) (36) 
p*, S7 >o { {Pn}li L J j 



> sup < sup E 

T>ol hL 



inf P [ JS (S, d) - ix;Y T (x)(x; Y)> 1 \S] 



Texp(- 7 )^ (37) 
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where in (l3~6t , S — X — Y, and in (1371) , the probability measure is generated by PsPy\x=x, 

ix;Y t (x; y) = log Py Jf =x (y) (38) 

and the supremum is over 7 > and all collections {Py i; . . . , Py T } of probability distributions 
on the channel output alphabet y parameterized by the partition index t. 

Proof: Fix an arbitrary collection of probability measures P Yt on y parametrized by partition 
index t. Let P Zt {z) = Y. ye y p z\Y(z\y)P Yt (y) where P Z \ Y is the decoder. 

F[ 3s (S,d)-t x . tYj(x) (X;Y)> 1 \ 

T 

<e+^P 5 (,)^^P x|5 (x| S )^ *V(*lv) 

s&M t=l x£X t yey z&B d (s) 

■ P Y \ x (y\x)l{P Y \ x (y\x)<P Yt {y)^(3s{s,d)- 1 )} (39) 

T 

< e + exp(- 7 )^^P 5 ( S )exp(j 5 ( S ,d))^P yi (y) ^ Pz\y{z\y) ^ Px\s(x\s) (40) 

t=l s£M yey zeB d (s) xeX t 

T 

< e + exp(- 7 )^^P 5 ( S )exp(j 5 ( S ,rf))^P yi ( 2/ ) ^ P Z \y{z\y) (41) 

t=l seX yey z£B d (s) 

T 

< e + exp (-7) £ J] P s (s) exp d)) Pz t (S d ( S )) (42) 

t=l sex 
r 

< e + exp (-7) ]T ^ P Zt (z) ^ P s (s) exp ( Js (s, d) + A*d - \*d(s, z)) (43) 
<e + Texp(-7) (44) 



where (144)) is due to (PT31) . Optimizing over 7 > and Py, . . . , P Yt to obtain the best possible 
bound for a given encoder Px\s- To obtain a code-independent converse, we simply optimize 
over P x \s, and d36l) follows. To show ((3~7l) . we weaken d36l) as, 



e > sup < sup inf P 

7> ° { {^}l 1 Px]S 



Js(S, d) - i X]Yt{x) (X; Y) > 7 - exp (-7) (45) 
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and observe that for any choice of {P Yx , ■ ■ ■ , Py t }, 

inf P \ Js (S, d) - i X]Yj{x) (X; Y) > j] (46) 

= E p ^) P inf E p ^W a )E^*)i{w(a,<0-*w w (*;y)>7} (47) 

E ^(^) mf ^P y | X (y|x)l { Jff ( a ,d) - t* ; y T(x) (a;;y) > 7} (48) 



sex yey 



E 



inf P 



j s (S, d) - t X; Y T(x) (x;Y)> 1 \S 



(49) 



B. Converses via hypothesis testing and list decoding 

To show a joint source-channel converse in (3), Csiszar used a list decoder, which outputs a list 
of L elements drawn from M.. While traditionally list decoding has only been considered in the 
context of finite alphabet sources, we generalize the setting to sources with abstract alphabets. In 
our setup, the encoder is the random transformation Px\s, an d me decoder is defined as follows. 

Definition 6 (List decoder). An (L, Qg) list decoder is a random transformation P S \ Y > where S 
takes values on Q s -measurable sets with Q s -measure not exceeding L: 



Qs[S)<L (50) 

The error probability with this type of list decoding is the probability that the source output 
S is not on the decoder output list for Y: 

E J2 P s\Y(s\y)PY\x(y\x)Px\s(x\s)P s (s) (51) 
where MS L > consists of Qs-measurable subsets of M. with Qs-measure not exceeding L. 

Definition 7 (List code). An (e, L, Qs) list code is a pair of random transformations {Px\Si P§\y) 
such that (l50l) holds and the list error probability (l5"TT) does not exceed e. 



Of course, letting Q$ = Us, where Us is the counting measure on Ai, we recover the 
conventional list decoder definition. The almost-lossless JSCC setting (d = 0) in Definition Q] 
corresponds to L = 1, Q s = Us- If the source alphabet is the real line, it is reasonable to let 
Qs be the Lebesgue measure. 
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Any converse for list coding implies a converse for lossy coding. To see this, observe that 
any (d, e) lossy code can be converted to a list code with list error probability not exceeding e 
by feeding the lossy decoder output to a function that outputs the list of all source outcomes s 
within distortion d from the output z £ jCt of the original lossy decoder. Therefore, the set of 
all (d, e) lossy codes is included in the set of all list codes with list error probability < e and 
list size 

L = max Q s {{s : d(s, z) < d}) (52) 

Denote by 

Pa(P,Q)= min Q[W = 1] (53) 

Pw\x '■ 
W[W=l]>a 

the optimal performance achievable among all randomized tests P\v\x- X {0,1} between 
probability distributions P and Q on X is denoted by (1 indicates that the test chooses p|j|. In 
fact, Q need not be a probability measure, it just needs to be cr-finite in order for the Neyman- 
Pearson lemma and related results to hold. 

The hypothesis testing converse for channel coding [14, Theorem 27] can be generalized to 
joint source-channel coding with list decoding as follows. 

Theorem 4 (Converse). Fix Ps and Py\x> and let Qs be a a -finite measure. The existence of 
an (e, L, Qs) list code requires that 

inf sup 0i- e (P s Px\sP Y \x, QsPx\sPy) < L (54) 

P X\S Py 

where the supremum is over all probability measures Py defined on the channel output alphabet 

y. 

Proof: Fix Q s , the encoder Px\s> an d an auxiliary a-finite conditional measure Qy\xs- 
Consider the (not necessarily optimal) test for deciding between Psxy = PsPx\sPy\x and 
Qsxy = QsPx\sQy\xs which chooses Psxy if S is on the decoder output list when Y is 
observed at the channel output. 

4 Throughout, P, Q denote distributions, whereas P, Q are used for the corresponding probabilities of events on the underlying 
probability space. 
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According to P, the probability measure generated by Psxy, the probability that the test 
chooses Psxy is given by 



P 



S e S 



> 1 -e 



(55) 



Since 



S e S 



is the measure of the event that the test chooses Psxy when Qsxy is true, 
and the optimal test cannot perform worse than the possibly suboptimal one that we selected, it 
follows that 



01-e{P S Px\sPY\X,Q S Px\sQY\Xs) < 



S eS 



(56) 



Now, fix an arbitrary probability measure Py on y. Choosing Qy\xs — Py> the inequality in 
(l56l) can be weakened as follows. 



S eS 



= E p y^ E p s\y(s\v) E Qs(*) E p M*\s) 

= E p ^)E p ^(%)E^( s ) 



L 



s&S 



(57) 
(58) 
(59) 
(60) 



Optimizing the bound over P Y and choosing Px\s that yields the weakest bound in order to 
obtain a code-independent converse, (|54l ) follows. ■ 

Remark 3. Similar to how Wolfowitz's converse for channel coding can be obtained from the 
meta-converse for channel coding [14J, the converse for almost-lossless joint source-channel 
coding in (1341) can be obtained by appropriately weakening (1541) with L = 1. Indeed, invoking 

El 



P„(P,Q)>- «-P 
7 



dP 
dQ 



> 7 



and letting Q$ — Us in (|54|) . where t/5 is the counting measure on we have 

1 > inf sup ^ e (P s Px\sPY\x,U s P xls Py) 



P X\S Py 



> inf sup sup - (1 -e-P [i x . Y (X;Y) - i s (S) > log 7]) 

PX\S Py 7>0 7 



(61) 

(62) 
(63) 



which upon rearranging yields (l34l) . 
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In general, computing the infimum in (|54l ) is challenging. However, if the channel is symmetric 
(in a sense formalized in the next result), 0i- e {PgPx\sPY\Xi U s Px\sPy) 1S independent of Px\s- 

Theorem 5 (Converse). Fix a probability measure P Y - Assume that the distribution of i x . Y {x\ Y) 
does not depend on x G X under either P Y \ X=X ° r Py- Then, the existence of an (e, L, Qg) list 
code requires that 

P 1 - e (PsP Y \x= x ,QsPY)<L (64) 

where x G X is arbitrary. 

Proof: The outcome of the optimum binary hypothesis test between P and Q only depends 
on |^ [TT5l for a given observation. In particular, the optimum binary hypothesis test W* for 
deciding between PgPx\sPy\x and QsPx\sPy satisfies 

W* - (S,i x . Y {X;Y)) - {S,X,Y) (65) 

For all s G M, x G X, we have 

F[W* = 1\S = s,X = x] 



Q[W* = l\S = s,X = x] 

where 

. d67]) is due to (l65l) . 

• (l68l) uses the Markov property S — X — Y, 

• (l69l) follows from the symmetry assumption on the distribution of i x . Y (x, Y), 

• (1701) is obtained similarly to (1681) . 

Since (l69l ), (fTOl) imply that the optimal test achieves the same performance (that is, the same 
P [W* = 1] and Q [W* = 1]) regardless of Px\s, we choose Px\s — ${ x ) f° r some x G X in the 
left side of ([54]) to obtain d64l). ■ 



E[P [W*= 1\X = x,S = s,Y,i x . Y (x;Y)]] 


(66) 


E [P [W* = 115 = s,t x . Y (x; Y)]] 


(67) 


^2P Y \x(y\x)P W *\S=s,i x .y(x;Y)(M S = s ^X;y(x] Y)) 


(68) 


vex 




F[W* = 1\S = s] 


(69) 


Q[W* = 1\S = s] 


(70) 
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Remark 4. In the case of finite channel input and output alphabets, the channel symmetry assump- 
tion of Theorem [5] holds, in particular, if the rows of the channel transition probability matrix 
are permutations of each other, and P yn is the equiprobable distribution on the (n-dimensional) 
channel output alphabet, which, coincidentally, is also the capacity-achieving output distribution. 
For Gaussian channels with equal power constraint, which corresponds to X = {x : \x\ 2 = nP}, 
any spherically- symmetric Py n satisfies the assumption of Theorem [5J 

□ 

|_| IV. ACHIEVABILITY 

Gi\ ferq a source code (f s , gs ) of size M, and a channel code (fc M \ gc^) of size M, we 
may concatenate them to obtain the following sub-class of the source-channel codes introduced 
in Defynijion CD 

Defini^tuli 8. An (M, d, e) source-channel code is a (d, e) source-channel code such that the 

encoa^efand decoder mappings satisfy 
□ 



□ 
□ 

where~^ 
□ 

□ 

□ 

□ 

□ 

□ 

(see tftgjS). 
□ 

□ 
□ 



f (AO Q f (Af) 



g = g< M) °g< M) 



f s (M) : M*->{1,...,M} 

gf):y^{l,..,M} 
gf K {1,...,M}^M 



(71) 
(72) 

(73) 
(74) 
(75) 
(76) 



r(M) 

s 


{1,...,M} 


r(M) 

c 


X 


P Y\X 


Y 


JM) 
6c 


{1,...,M} 


£ (M) 


Z 
















¥[d(S,Z) > d] 


< e ■ 









Fig. 2. A (d, e) joint source-channel code. 
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The conventional separate source-channel coding paradigm corresponds to the special case of 
Definition [8] in which the source code (f s (M) , gs ) is chosen without knowledge of Py\x an d the 
channel code (f s , gs ) is chosen without knowledge of P$ and the distortion measure d. If, 
in fact, both the source and the channel code are chosen optimally (i.e. with minimal distortion 
and error probability, respectively) for their given sizes, the separation principle guarantees that 
under certain conditions (which encompass the memoryless setting in this paper, see lfT6lO 
the asymptotic fundamental limit of joint source-channel coding is achievable. In the finite 
blocklength regime, however, such SSCC construction is, in general, only suboptimal. Within 
the SSCC paradigm, we can obtain an achievability result by further optimizing with respect to 
the choice of M: 

Theorem 6 (Achievability, SSCC). Fix Py\x, d and P$. Denote by e*(M) the minimum achiev- 
able maximal error probability among all transmission codes of size M, and the minimum 
achievable probability of exceeding distortion d with a source code of size M by e*(M,d). 
Then, there exists a (d, e) source-channel code with 

e < min{e*(M) + e*(M, d)} (77) 

M 

Bounds on e*(M) and e*(M,d) have been obtained recently in 031 and 021, respectively. 

Definition [8] does not rule out choosing the source code based on the knowledge of Py\x or 
the channel code based on the knowledge of P$, d and d. One of the interesting conclusions 
in the present paper is that the optimal dispersion of JSCC is achievable within the class of 
(M, d, e) source-channel codes introduced in Definition [8] However, the dispersion achieved by 
the conventional SSCC approach is in fact suboptimal. 

To shed light on the reason behind the suboptimality of SSCC at finite blocklength despite its 
asymptotic optimality, consider a source code that, most of the time, encodes the source output 
within distortion d > D (^), for a fixed ratio -. Recall that the reason SSCC achieves the 
asymptotic fundamental limit is that the output of the source encoder is, for large k, approximately 
equiprobable over a set of roughly exp [kR{d)) distinct messages. From the channel coding 
theorem we know that there exists a channel code with the maximum likelihood decoder that 
is capable of distinguishing, with high probability, M = exp (kR(d)) < exp (nC) messages. 
Therefore, simply putting the source code and the channel code together asymptotically results 
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in the maximum distortion d. Since d can be chosen to be arbitrarily close to its optimum value 
D {jrf), such separated scheme is asymptotically optimal. 

However, at finite n, the output of the optimum source encoder is not, in general, equiprobable 
or nearly equiprobable, so there is no reason to expect that a separated scheme employing 
a channel code equipped with the maximum-likelihood detector would achieve the optimum 
nonasymptotic performance. Indeed, in the non-asymptotic regime the gain afforded by taking 
into account the residual encoded source redundancy at the channel decoder is appreciable. The 
following achievability result, obtained using independent random source codes and random 
channel codes within the paradigm of Definition [81 capitalizes on this intuition. 

Theorem 7 (Achievability). For every positive integer M, there exists an (M, d, e) source- 
channel code with 



e < inf 

Px,Pz,l>Q 









exp j- 



+ E 



t x , Y (X;Y)-\og 



\e-r - (1 - P z (B d (S))) 



Pz(B d (S)) 



M I 



E [(1 - P z {B d (S))) M ] 



(78) 



where the expectations are with respect to PsPxPy\xPz defined on M. x X x y x M., and 



M 1 

L — ' m 



(79) 



m=l 



is the M—th harmonic number. 



Proof: We construct a code with separate encoders for source and channel and separate 
decoders for source and channel as in Definition [8] We will perform a random coding analysis 
by choosing random independent source and channel codes which will lead to the conclusion 
that a pair of source and channel codes results in the error probability guaranteed in (|78T) . 

Source Encoder. Given an ordered list of representation points (zi, . . . , zm) £ M. M , the source 
encoder selects the lowest index me {1, . . . , M} such that the source outcome is within distance 
d of z m . If no such index can be found, the source encoder outputs an arbitrary index, e.g. M. 
Therefore, 

m d(s, z m ) < d < minj = i v .. jm _i d(s, z { ) 
M d < mhij =1 ... M _i d(s, z t 



(80) 
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In a good (M, d, e) JSCC code, M would be chosen so large that with overwhelming probability, 
a source outcome would be encoded successfully within distortion d. 

Channel Encoder. Given a codebook (21, . . . ,%) E X M the channel encoder outputs x m if 
m is the output of the source encoder: 

fi M \m) = x m (81) 

Channel Decoder. Having observed y E y, the channel decoder chooses arbitrarily among the 
members of the set: 



g W( y ) = m e arg max YlXKyi 3 > (82) 



p Y\x(y \xj) 

-je{i,...M} j 

A MAP decoder would multiply PY\x(y\xj) by Px(xj). While that decoder would be too hard to 
analyze, the ratio in (|82l is a good approximation because, averaging over all source codebooks, 
the probability that the j-th index is chosen is approximately proportional to j (e.g. ifTTl (3)]). 
Source Decoder. The source decoder outputs z m if m is the output of the channel decoder: 

gi M \m) = z m (83) 

Error Probability Analysis. We now proceed to analyze the performance of the code described 
above. A channel decoding error can occur if and only if 

: MW>W (84) 

j m 

Let the channel codebook (X 1 ,...,Xm) be drawn i.i.d. from P x , and independent of the 
source codebook (Zi, . . . , Z M ), which is drawn i.i.d. from P z . Denote by e(x M , z M ) the excess- 
distortion probability attained with the source codebook z M and the channel codebook x M . 
Define the random variable U E {1, . . . , M + 1} which is a function of S and z M only: 

\i M \s) d(s, gs (us))<d 

U = I (85) 
I M + 1 otherwise 

Conditioned on the event {d(S, g s (f s (5)) < d} (no failure at the source encoder), the probability 
of excess distortion is upper bounded by the probability that the channel decoder does not choose 

fi M) (5), so 



M 



e(x M ,z M 



*)<J2 P u\zM(m\z M )V 



m=l 



1 i J mP Y \ X (Y\ Xj ) , 
U \ jP Y]x (Y\ X „ ^ - I ' 



0C r 



+ P uiz m(M + 1\z m ) (86) 
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We now average (f86l ) over the source and channel codebooks. Averaging the m-th term of the 
sum in ([86]) with respect to the channel codebook yields 



P mzM (m\z M )¥ 



, j I mPy\x(Y\X 3/ > } 



jP Y \x{Y\X n 



(87) 



where Y,Xi,..., X M are distributed according to 



Pyx!..jc m (y,xi, . . . ,x M ) = PY\x m (y\x m ) Y[ p x(xj] 



(88) 



Letting X be an independent copy of X and applying the union bound to the probability in 
871) . we have 

M 



P 



, . { mPr\ X (Y\X j ) > 

y \jp Y \x{Y\x m ) - 



< E 



< E 



mm 



mm 



i,£p 



mPy| X (r|X) 

f LiP m (^|x) 



> 1|X,Y 



mE [P y | X (F|X)|y] 



E 



min < 1, H M 



j= l jP Y \x(Y\X) 

mP Y (Y) Y 
P Ylx (Y\X)S_ 



E [exp {-\t X ; Y (X; Y) - logm - logPT M | + )] 



(89) 

(90) 

(91) 
(92) 



where 



. d90j) is due to I {a > 1} < a; 

• (l92l) is due to min{l, a} = exp ^— | log where a is nonnegative. 

On the other hand, conditioned on S = s and averaged over the source codebook, U is distributed 



as: 



Pu\s(m\s) 



where we denoted for brevity 



p(s)(l -p(s)) m " 1 m = l,2,...,M 
(l-p(s)) M m = M+l 



(93) 



p(s) = P z (B d (s)) (94) 
Averaging (|86l ) further over the source codebook and using (|92l) and (|93l , we conclude 



M 



E[e(Z M ,X M )] < inf E [exp -\i X; y(X; Y) - logU - logF M | + l{C7 < M}] +E (1 - p(S)) 

Px,Pz L 

(95) 
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Finally, we further upper bound the first expectation in the right side of (|95l ) to make it more 
easily computable and analyzable. To that end, note that regardless of whether j < M, (|93T ) 
results in 



¥[j <U < M \ S = s}= (1 - p(s)) k - (1 - 

Fix an arbitrary 7 > and observe that 

(1 - p(s)) < (1 - p(s))^)" 1 
< e -^)(^)-0 



M 



So, letting j 



p 



< e~ 7 

, we obtain from <[%]) and (HU) that 

7 



(96) 

(97) 
(98) 
(99) 



<U <M\ S 



\S = s 


< 







pOO) 



M 



Using 



1{U < M} <1<U< 



7 



P(S) 



7 



AS) 



<U <M 



and (1 1 00b . we upper bound the first term in (|9"51) as 

E [exp -\% X ;y(X- Y) - log U - \ogH M \+l{U < M}] 



(100) 



(101) 



(102) 



< E 



< E 



exp -\i x -A X i Y ) ~ log U - \ogH M \ + l U < 



7 



P(S) 



+ P 



7 



<U <M 



exp-|^.y(X;F) - log 



7 



logF 



M 



E 



-(I-P(^)) 



M I 



(103) 
(104) 



Finally, (1781) follows by weakening (|95l) by means of (11041) and invoking Shannon's random 
coding argument. ■ 
The code size M that leads to tight achievability bounds following from Theorem [7J is in 
general quite different from the size that achieves the minimum in (1771) . In that case, M is chosen 
so that logM lies between kR(d) and nC so as to minimize the sum of source and channel 
decoding error probabilities without the benefit of a channel decoder that exploits residual source 
redundancy. In contrast, Theorem [7J is obtained with an approximate MAP decoder that allows a 
larger choice for log M, even beyond nC. Still we can achieve a good (d, e) tradeoff because the 



September 7, 2012 



DRAFT 



21 



channel code employs unequal error protection: those codewords with lower indices are more 
reliably decoded. 

In the case of almost-lossless JSCC, the bound in Theorem [7J can be sharpened as shown 
recently by Tauste Campo et al. ||9l . 

Theorem 8 (Achievability, almost-lossless JSCC [9]). There exists a (0, e) code with 

t < infE [exp (-\i X;Y (X;Y)-i s (S)\+)] (105) 

Px 

where the expectation is with respect to PsPxPy\x defined on M. x X x y. 

V. Gaussian Approximation 

In addition to the basic conditions flaj-flc]) of Section UH in this section we impose the following 
restrictions. 

(i) The channel is stationary and memoryless, Py™\x n = Py\x x . . . x Py|x- 

(ii) The source is stationary and memoryless, P S k = P$ x . . . x P s , and the distortion measure 
is separable, d k (s k ,z k ) = iJ2i=i d ( s h z i)- 

(iii) The distortion level satisfies d m m < d < d max , where <i m i n is defined in ©, and d max = 
inf zgC ?E [d(S, z)], where the average is with respect to the unconditional distribution of S. 
The excess-distortion probability satisfies < e < 1. 

(iv) E[d 9 (S, Z*)] < oo where the average is with respect to Ps x Pz* and Pz* is the output 
distribution corresponding to the minimizer in Q. 

Conditions (Q) and dn]) are standard in the memoryless joint source-channel coding problem setup. 
The technical condition ([[v]) ensures applicability of the Gaussian approximation in Theorem [9] 
below. 

Theorem 9 (Gaussian approximation). Under restrictions ©— div]), the parameters of the optimal 
(k, n, d, e) code satisfy 

nC - kR(d) = ^nV + fcV(d)Q _1 (e) + 9 (n) (106) 

where 

V(rf)=Var[ JS (S,rf)] (107) 

and k = O (n), 
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1) If A and B are finite and the channel has no cost constraints, 

V = Var ^. Y (X*;Y*)] (108) 

Z * ( X - 1 y)=l0g^^(y) (109) 

where X*, Y* are the capacity-achieving input and output random variables. 

2) If the channel is Gaussian with either equal or maximal power constraint, 



1 



2 V" (l + Pf 



V==-[l- - ~-— 2 )log 2 e (110) 



where P is the signal-to-noise ratio. 
3) IfV > 0, 



where 



-clog n + 0(1) < 9{n) (111) 
< clog n + log log n + O (1) (112) 



c=\A~\ (H3) 



Var[A Y .(X,A*)] 
+ E[|A Y *(X,A*)|]loge 

In (|1 141) . (■)' denotes differentiation with respect to A, A Y *(x, A) is defined by 



(114) 



Ay * (X ' A) = bg E[exp(Ad-Ad(x,Z*))] (U5) 
(c/ Definition^ and A* = —R'(d). 

4) IfV = 0, (II 121) stz'/Z holds, while (II 1 II) is replaced with 

o(Vn)<9(n) (116) 

5) If the channel is such that the (conditional) distribution o/zx. Y (x;Y) Joes no? depend on 
x (z X or Gaussian with either equal or maximal power constraint, then c = \. 

6) In the almost-lossless case, R(d) = H(S), and provided that the third absolute moment of 
is(S) is finite, (11061) and (111 II) still hold, while (|1 121) strengthens to 

&(n) < -logn + 0(l) (117) 
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Proof: The asymptotic analysis of the bounds in Theorems [2] (converse, symmetric channel), 
|3] (converse, general lossy coding), |4] (converse, lossless coding) and [7] (achievability) is detailed 



Remark 5. If the channel and the data compression codes are designed separately, we can invoke 
channel coding [fT4ll and lossy compression IIT21 results to show that, 



Comparing (II 181 ) to (11061 ), observe that if either the channel or the source (or both) have 
zero dispersion, the joint source-channel coding dispersion can be achieved by separate coding. 
In that special case, either the d -tilted information or the channel density are so close to being 
deterministic that there is no need to account for the true distributions of these random variables, 
as a good joint source-channel code would do. 

The Gaussian approximations of JSCC and SSCC in (|106l) and (II 181) admit the following 
heuristic interpretation when n is large and (thus, so is k). Since the source is stationary 
and memoryless, the normalized d-tilted information J = -jgk (S k ,d) becomes approximately 
Gaussian with mean -R(d) and variance ^^M. Likewise, the normalized channel information 
density / = ^tx^- ) Y n (X n ;Y n ) is, for large k, n, approximately Gaussian with mean C and 
variance — . Since the source is independent of the channel, the random variable I — J is 
approximately Gaussian with mean C — -R(d) and variance - (~V(d) + V), and (11061) reflects 
the intuition that under JSCC, the source is reconstructed successfully within distortion d if and 
only if the channel information density exceeds the source d-tilted information, that is, {/ > J}. 
In contrast, in SSCC, the source is reconstructed successfully if (I, J) falls into the intersection 
of half-planes {I > r} fl { J < r} for some r = l2fLM 9 which is the capacity of the noiseless 
link between the source and the channel code block that can be chosen so as to minimize the 
probability of that intersection, as reflected in (|118l) . Since in JSCC the successful transmission 
event is strictly larger than in SSCC, i.e. {I > r}D{J < r} C {/ > J}, separate source/channel 
code design incurs a performance loss. It is worth pointing out that {/ > J} leads to successful 
reconstruction even within the paradigm of the codes in Definition [8] because, as explained after 
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+ O (log(n + k)) 



(118) 



24 



the proof of Theorem [7J unlike the SSCC case, it is not necessary that lie between / and 
J for successful reconstruction. 

Remark 6. Using Theorem [9] it can be shown that 



where the rate-dispersion function of JSCC is found as (recall Definition [4]), 



Remark 7. Under regularity conditions similar to those in [12, Theorem 14], it can be shown 
that 

where the distortion-dispersion function of JSCC is given by 

W{R)=(^^\ (V + RV(D(C))) (122) 



R 

Remark 8. If the basic conditions © and/or flc]) fail so that there are several distributions Pz*\s 
and/or several Px* that achieve the rate-distortion function and the capacity, then 

V(d) < mmV z *. x *(d) (123) 

W(d) < minW z * ; x*(rf) (124) 

where the minimum is taken over Pz*\s an d Px*, and Vz* ] x*(d) (resp. Wz* ; x*(^)) denotes (11201) 
(resp. (11221) ) computed with Pz*\s an d Px*- The reason for possibly lower achievable dispersion 
in this case is that we have the freedom to map the unlikely source realizations leading to high 
probability of failure to those codewords resulting in the maximum variance so as to increase 
the probability that the channel output escapes the decoding failure region. 

Remark 9. The dispersion of the Gaussian channel is given by (II 101) . regardless of whether an 
equal or a maximal power constraint is imposed. An equal power constraint corresponds to the 
subset of allowable channel inputs being the power sphere: 

F(P) = \ x n el": ^ = np\ (125) 

I °N J 

where cr^ is the noise power. In a maximal power constraint, (11251) is relaxed replacing '=' with 
'<'. 
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Writing eq for the equal and max for the maximal power constraint, we remark that the 
bounds for the latter can be obtained from the bounds for the former via the following relation 

k* eq (n, d, e) < k* max (n, d, e) < k* eq (n + 1, d, e) (126) 

where the right-most inequality is due to the following idea dating back to Shannon: a (k, n, d, e) 
code with a maximal power constraint can be converted to a (k, n + 1, d, e) code with an equal 
power constraint by appending an (n + l)-th coordinate to each codeword to equalize its total 
power to na^P. From (|126l) it is immediate that the channel dispersions for maximal or equal 
power constraints must be the same. 

VI. Lossy transmission of a BMS over a BSC 

In this section we particularize the bounds in Sections Hill [IV] and the approximation in Section 
IVlto the transmission of a BMS with bias p over a BSC with crossover probability 5. The target 
bit error rate d < p. 

The rate-distortion function of the source and the channel capacity are given by, respectively, 

R(d) = h(p) - h(d) (127) 
C = l- h(5) (128) 

The source and the channel dispersions are given by |fl2l|. [fl~4|| : 

V(d) = p(l - p) log 2 ] —^- (129) 

P 

V = 5(1-5) \og 2 ^-^- (130) 
5 

where note that (11291) does not depend on d. 

Throughout the section, w(a t ) denotes the Hamming weight of the binary ^-vector a £ , and 
denotes a binomial random variable with parameters i and a, independent of all other random 
variables. In addition, the binomial sum is denoted by 




(131) 



A straightforward particularization of the d -tilted information converse in Theorem [2] leads to 
the following result. 
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Theorem 10 (Converse, BMS-BSC). Any (k,n,d,e) code for transmission of a BMS with bias 
p over a BSC with bias 5 must satisfy 

(T p fc - kp) log — + (T 5 n - n5) log > nC - kR(d) + 7 

p p o 



e > sup <J P 

7>0 



exp (-7) 

(132) 



Note that the terms to the left of the '>' sign inside the probability in (11321) are zero-mean 
random variables whose variances are equal to kV(d) and nV, respectively. 

Proof: Let Py n = P Y n*, which is the equiprobable distribution on {0, l} n . An easy exercise 
reveals that 



Jsk(s k , d) = i S k(s h ) - kh{d) 

i S k{s k ) = kh(p) + (w(s h ) - kp) loj 



1 — p 

p 



i xn .y n * (x n ; y n ) = n (log 2 - h{8)) - {w{y n - x n ) - nS) lo§ 



1-5 



(133) 
(134) 

(135) 



Since w(Y n — x n ) is distributed as TJ 1 regardless of x n E {0, l} n , and w(S k ) is distributed as 
Tp, Theorem [2] applies, and (|33l) becomes (|132l) . ■ 
The hypothesis-testing converse in Theorem |4] particularizes to the following result: 

Theorem 11 (Converse, BMS-BSC). Any (k,n,d,e) code for transmission of a BMS with bias 
p over a BSC with bias 5 must satisfy 

1 1 



P 



+ AP 



U 



2' 2 



£ ( L«J ] Tk 



where the discrete random variable U (a, (3) is given by 



U(a,P) = T c k \og 



P 



T^log. 



anJ < A < 1 anJ r are uniquely defined by 

P [U (p, 5) < r] + AP [£/ (p, 5) = r] = 1 - e 



(136) 



(137) 



(138) 



Proof: As in the proof of Theorem \T0[ we let P Y n be the equiprobable distribution on 

{0,1}™, Pyn = Pyn*. Since under Pyn|xn =:r n, w (Y n — x n ) is distributed as T^™, and under 
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Pyn*, w (Y n — x n ) is distributed as T?, irrespective of the choice of x n G A n , according to both 

2 

measures the distribution of the information density in (|135l) does not depend on the choice of 
x n , so Theorem [5] applies. Further, we choose Q S k to be the equiprobable distribution on {0, l} fc 
and observe that under P S k, the random variable w(S k ) in (11341) has the same distribution as 
T k while under Q S k it has the same distribution as Tf. Therefore, the log-likelihood ratio for 

' 2 

testing between P S kP Y ^\x n =x n an d QskPy™ has the same distribution as ('~' denotes equality 
in distribution) 

P S k(S k )P Y n\ X n =x n(Y n ) , n v „^ / c*k\ I / i o flicw 

l0g ^fWiT = ^ y " (l ;F )- %fc (5) + Hog2 (139) 

~ n log(2 — 25) — k log ■ 



(140) 



1 — p 

U(p,8) under P S kP Y ™\x n =x n 
U (|, |) under Q S kP Y n* 

so /3i_ e (P S fcPyn|x"=s™, QskPy"-*) is equal to the left side of ( 1136b . Finally, matching the size 
of the list to the fidelity of reproduction using (|52~I) . we find that L is equal to the right side of 
(fl36l) . ■ 
If the source is equiprobable, the bound in Theorem \TT\ becomes particularly simple, as the 
following result details. 

Theorem 12 (Converse, EBMS-BSC). For p = |, if there exists a (k, n, d, e) joint source-channel 
code, then 



where 

r* = max 



jr: £(^(l-5r<<l-ej (142) 
and AG [0, 1) is solution to 



The achievability result in Theorem [7] is particularized as follows. 
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Theorem 13 (Achievability, BMS-BSC). There exists an (k,n,d,e) joint source-channel code 
with 



e < inf { E 

M,7>0 



exp 



nC — (T s — no) log — log 



S 



p(T p k ) 



+ e^ + E[(l-p(T p k )) M ] 
where p: {0, 1, . . . , k} !->■ [0, 1] is defined as 

k 

p(T) = ^L(T,ty(l-g) 

where 



n—t 



t=0 



L(T,t) 



t-l) t-kd<T<t + kd 
otherwise 
t + T-kd 



p — d 
I -2d 



(144) 
(145) 

(146) 

(147) 
(148) 



Proof: We weaken the infima over P x ™ and P Z k in (|78l) by choosing them to be the product 
distributions generated by the capacity-achieving channel input distribution and the rate-distortion 
function-achieving reproduction distribution, respectively, i.e. Px™ is equiprobable on {0, l} n , 
and P Z k = Pz* x ... x P z *, where Pz*(l) = q. As shown in [fT2l proof of Theorem 21], 

(B d (s k )) > p(w(s k )) (149) 

On the other hand, \Y n — X n \ is distributed as T 5 n , so (|144J> follows by plugging in (11351) and 
(fl49l) into dZH). ■ 
In the special case of the BMS-BSC, Theorem |9] can be strengthened as follows. 

Theorem 14 (Gaussian approximation, BMS-BSC). The parameters of the optimal (k, n, d, e) 
code satisfy (11061) where R(d), C, V(d), V are given by (11271) . (11281) . (11291) . (11301) . respectively, 
and the remainder term in (11061) satisfies 

O(l)<0(n) (150) 
< ( 1 + J log fc + log log A; + 0(1) (151) 
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if < d < p, and 

-~logn + 0(l) < 9{n) (152) 
<Ilogn + 0(l) (153) 

ifd = 0. 

Proof: An asymptotic analysis of the converse bound in Theorem \TT\ akin to that found in 
lfT2l proof of Theorem 23] leads to (11501) and (11521) . An asymptotic analysis of the achievability 
bound in Theorem [T3l similar to the one found in [fT2l Appendix G] leads to (11511) . Finally, (11531) 
is the same as (II 171) . ■ 
The bounds and the Gaussian approximation (in which we take 9 (n) = are plotted in Fig. [3] 
(d = 0), Fig. 0](fair binary source, d > 0) and Fig. |5] (biased binary source, d > 0). A source of 
fair coin flips has zero dispersion, and as anticipated in Remark |5l JSSC does not afford much 
gain in the finite blocklength regime (Fig. |4]). The situation is different if the source is biased, 
with JSCC showing significant gain over SSCC (Figures |3] and |5]). 

VII. Transmission of a GMS 

OVER AN AWGN CHANNEL 

In this section we analyze the setup where the Gaussian memoryless source Si ~ A/"(0, cig) is 
transmitted over an AWGN channel, which, upon receiving an input x n , outputs Y n = x n + N n , 
where N n ~ A/"(0, cr^I). The encoder/decoder must satisfy two constraints, the fidelity constraint 
and the cost constraint: 

• the MSE distortion exceeds < d < al with probability no greater than < e < 1; 

fl 

• each channel codeword satisfies the equal power constraint in ©o 
The capacity-cost function and the rate-distortion function are given by 

= I log (jH (154) 
C7(P) = ^log(l + P) (155) 

5 See Remark [9] in Section [V] for a discussion of the close relation between an equal and a maximal power constraint. 
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The source dispersion is given by Ifl2ll : 

V(d) = Uog 2 e (156) 

while the channel dispersion is given by (II 101) lfl4ll . 

In the rest of the section, W[ denotes a noncentral chi-square distributed random variable with 
£ degrees of freedom and non-centrality parameter A, independent of all other random variables. 
The Euclidean norm is denoted by | • |. 
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Approximation, JSCC = SSCC <fl06b 



Achievability, SSCC {77} 



200 300 400 500 600 700 800 900 1000 



Fig. 4. Rate-blocklength tradeoff for the transmission of a fair BMS over a BSC with crossover probability S — d = 0.11 
and e = 1(T 2 . 



A straightforward particularization of the d-tilted information converse in Theorem [2] leads to 
the following result. 

Theorem 15 (Converse, GMS-AWGN). If there exists a (k, n, d, e) code, then 

P 



e > sup <J P 

7>0 



1 + P p 



Wl-n) > nC(P) - kR(d) + 7 



exp (-7) 
(157) 



Observe that the terms to the left of the '>' sign inside the probability in (11571) are zero-mean 
random variables whose variances are equal to kV(d) and nV, respectively. 
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Proof: The spherically-symmetric P Y n = Py™ = Py* x . . . x Py*, where Y* ~ jV(0, + 
P)) is the capacity-achieving output distribution, satisfies the symmetry assumption of Theorem 
|2l More precisely, it is not hard to show (see lfl4l (205)]) that for all x n E J 7 (a), ix n -,Y n *(x n ', Y n ) 
has the same distribution under P Y n*\ X n =x n as 

n - los{l + P) -^l( J ^W l -n) (158) 

The d-tilted information in s k is given by 

3sk (s\ d) = ±]Dg4+0-?T-k)^ (159) 



2 ~° d ' V <rg 



Plugging (fl58l) and (fl59l) into (fJl, (fT57T) follows. 
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The hypothesis testing converse in Theorem \5\ is particularized as follows. 
Theorem 16 (HT converse, GMS-AWGN). 

d . 1 

dr < 1 



k I r fc-1 P 
'o 



n { 1+ -p) a 2 



where r is the solution to 



P 



P 



1 + P T 



Wl + < nr 



1-6 



(160) 



(161) 



Proof: As in the proof of Theorem EH we let Y n ~ F n * ~ JV(0,orft(l + P)I). Under 
Pyn| X n =2 ,n, the distribution of i Xn . Yn * (x n ; Y n *) is that of (11581) . while under P Y n*, it has the 
same distribution as (cf. Q31 (204)]) 

loge 



iog(i +p)-^ { p K( lH) - ») 



(162) 



Since the distribution of 2x";y™*(# n ; F n *) does not depend on the choice of x n E W 1 according 
to either measure, Theorem \5\ applies. Further, choosing Q S k to be the Lebesgue measure on 
R k , i.e. dQ S k = ds k , observe that 



log/ 5 ,(^) = log.^ fc(sfc) - k ^~'«-^ l0g6 ^^ 



ds k 

Now, (11601) and (11611) are obtained by integrating 



(163) 



l| log f sk (s k ) + i xn . Yn *(x n ; y n ) > \ log(l + P) + \ loge - \ log(27ra|) - ^nr ) (164) 

with respect to ds k dP Y n*(y n ) and dP S k(s k )dP Y ^ xn=x u(y n ), respectively. 
The bound in Theorem [7] can be computed as follows. 



Theorem 17 (Achievability, GMS-AWGN). There exists a (k, n, d, e) code such that 

loge 



M,7>0 

+ e~ 7 + E 









exp / — 



nC(P) 

(i-pK)) 



Wn - n - log —, j— 

1 + P r J B p(W k ) 



, M 



where 



fw? P (t) 
max , , r < oo 



(165) 



(166) 
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fw is the probability density function of W, and p : IR + [0, 1] is defined by 

r(| + i) 



P(t) 



0F fcr + i) 



1-L 



fc— i 

2 



where 



L{r) 



1 - 



1 d 



. / d 



l+r z -2 



otherwise 



4| 1-4 U2 



(167) 



(168) 



Proof: We compute an upper bound to (|78T ) for the specific case of the GMS over an 
AWGN channel. First, we weaken the infimum over P Z k in (|78T) by choosing P Z k to be the 
uniform distribution on the surface of the A;-dimensional sphere with center at and radius 
r = y/kayj\ — It was shown in [fT2l proof of Theorem 37] (see also H, |[T8ll ), 

P zh (B d (s h )) > p (\s k \ 2 ) (169) 

which takes care of the source random variable in (T78T ). 

Now let us consider the channel random variable i X "-Y"(X n ; Y n ). Observe that since X n lies 
on the power sphere and the noise is spherically symmetric, \Y n \ 2 = \X n + N n \ 2 has the same 
distribution as \xq + N n \ 2 , where Sg is an arbitrary point on the surface of the power sphere. 
Letting x% = a N VP{l, 1, . . . , 1), we see that i|a# + N n \ 2 = £™ =1 (±Ni + v 7 ?) 2 has non- 
central chi-squared distribution with n degrees of freedom and noncentrality parameter nP. To 
simplify calculations, we express the information density as 

HP 

ix«M^V n ) = *x»;Y«*{xby n ) - -teP- (V n ) (170) 

Biyn* 

where Y n * ~ J\f(0, o^(l+P)I). The distribution of i x -^*{xq] Y n ) is the same as (fl58i Further, 
due to the spherical symmetry of both Pyn and Pyn*, as discussed above, we have 

which is bounded uniformly in n as observed in [14, (425), (435)], thus (11661) is finite, and (11651) 
follows. ■ 
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The following result strengthens Theorem |9] in the special case of the GMS-AWGN. 

Theorem 18 (Gaussian approximation, GMS-AWGN). The parameters of the optimal {k, n, d, e) 
code satisfy (11061) where R(d), C, V(d), V are given by (11541) , (11551) . (11561) . (11101) . respectively, 
and the remainder term in (11061) satisfies 

0(l)<6(n) (172) 

< ( 1 + t— ) logfc + loglogfc + 0(l) (173) 



loge 

Proof: An asymptotic analysis of the converse bound in Theorem [16] similar to that found 
in |[T2l proof of Theorem 40] leads to (11721) . An asymptotic analysis of the achievability bound 
in Theorem [171 similar to [|T2l Appendix K] leads to (11731) . ■ 
While their numerical evaluation reveals that bounds are not as tight as those in Section 
rvTl they suffice to show that JSCC noticeably outperforms SSCC in the displayed region of 
blocklengths (Fig. [6]). 

VIII. TO CODE OR NOT TO CODE 

In this section, we compare the performance of blocklength-1 codes with that of the best 
blocklength-n codes, leveraging the bounds in Sections [In] and [IV] and the approximation in 
Section [V] We show certain examples when symbol-by- symbol coding is, in fact, either optimal 
or very close to being optimal. 

A. Performance of symbol-by -symbol source-channel codes 

Definition 9. An (n,d,e,a) symbol-by-symbol code is an (n,n,d,e,a) code (f , g) (according 
to Definition [7J) that satisfies 

f(s n ) = (f a (si),...,fi(s„)) (174) 

g(2/ n ) = (gi(2/i),...,gi(z/n)) (175) 

for some pair of functions fi : <S i->- A and gi : B i— > <S. 

The minimum excess distortion achievable with symbol-by-symbol codes at channel blocklength 
n, excess probability e and cost a is defined by 

Di(n, e, a) = inf {d: 3(n, d, e, a) symbol-by-symbol code} . (176) 
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Definition 10. The distortion-dispersion function of symbol-by-symbol joint source-channel cod- 
ing is defined as 

,„ / \ r ^(C(a))-D 1 (7i,e,a) V 

Wi a = hm hm sup n — 1 — ' 1 . . (177) 

e->0 \ g- 1 (e) J 

where D(-) is the distortion-rate function of the source. 

As before, if there is no cost constraint (c n (x n ) = for all x G A n ), we will simplify the 
notation and write Z?i(n, e) for Di(n, e, a) and Wi for Wi(a). 

A symbol-by- symbol code has rate 1. Our goal in this section is to compare the excess 
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distortion performance of the optimal code of rate 1 at channel blocklength n with that of the 
optimal symbol-by- symbol code, evaluated after n channel uses. 

In addition to restrictions ©-dry]) of Section [V] we assume that the channel and the source 
are probabilistically matched in the following sense (cf. IfTTl ). 
(v) There exist a, Px*|s an d Pz*\y such that P x * an d fz*is generated by the joint distribution 
-Ps-fx*|s-fY|x-Fz*|Y achieve the capacity-cost function C(a) and the distortion-rate function 
D(C(a)), respectively. The channel cost function is assumed to be separable, c n (x n ) = 

££ILic(zi). 

Condition ensures that symbol-by-symbol transmission attains the minimum average (over 
source realizations) distortion achievable among all codes of any blocklength. The following 
results pertain to the full distribution of the distortion incurred at the receiver output and not 
just its mean. 

Theorem 19 (Achievability, symbol-by- symbol code). Under restrictions ©-flvD, if 



P 



i=l 



< e (178) 



where Pz n *\s n = Pz*\s x . . . x Pz*\s, and Pz*\s achieves D (C(a)), then there exists an (n, d, e, a) 
symbol-by-symbol code (average cost constraint). 

Proof: As shown in ifTTTl . if flvj) holds there exist a symbol-by- symbol encoder and decoder 
such that the conditional distribution of the output of the decoder given the source outcome 
coincides with distribution Pz*\s, so the excess-distortion probability of this symbol-by- symbol 
code is given by (11781) . ■ 

Theorem 20 (Converse, symbol-by- symbol code). Under restriction ©, any (n, d, e, a) symbol- 
by-symbol code (average cost constraint) must satisfy 



e> inf F[d n (S n ,Z n ) > d] (179) 

/(S;Z)<C(a) 

where P z ™\s" = Pz\s x . . . x P z \s- 

Proof: The excess-distortion probability at blocklength n, distortion d and cost a achievable 
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among all single-letter codes P X |s> Pz\v must satisfy 

e> inf F[d n {S n ,Z n ) > d] (180) 

-Fk|S>-f^|Y : 

S-X-Y-Z 

E[c(X)]<a 

> inf F[d n {S n ,Z n ) > d] (181) 

^X|S;^Z|Y : 

E[c(X)]<a 
/(S;Z)</(X;Y) 

where (11811) holds since S— X— Y— Z implies /(S; Z) < /(X; Y) by the data processing inequality. 
The right side of (11811 ) is lower bounded by the right side of (11791 ) because /(X; Y) < C(a) 
holds for all P x with E [c(X)] < a. ■ 

Theorem 21 (Gaussian approximation, optimal symbol-by-symbol code). Assume E [d 3 (S, Z)] < 
oo. Under restrictions ©-(jv]), 



D 1 (7i,e,a) = D (C{a)) + J^^-Q- 1 (e) + ^ (182) 

V n n 

Wi(a) = Vax[d(S,Z*)] (183) 

where 

0i(n)<O(l) (184) 
Moreover, if there is no power constraint, 

Oi(n) > ^^0(n) (185) 
Wi = W(l) (186) 

where 9{n) is that in Theorem® 

IfV&r [d (S, Z)] > and S, <S are finite, then 



(?i(n)>0(l) (187) 



Proof: Since the third absolute moment of d(5 , i , Z*) is finite, the achievability part of (11821) . 
namely, (11821 ) with the remainder satisfying (11841) , follows by a straightforward application of 
the Berry-Esseen bound to ([1781) . provided that Var[d(^,^)] > 0. If Var [d(S h Zf)] = 0, it 
follows trivially from (11781) . 
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To show the converse in (11851 ), observe that since the set of all (n, n, d, e) codes includes all 
(n, d, e) symbol-by- symbol codes, we have D(n,n,e) < Di(n,e). Since Q^ 1 (e) is positive or 
negative depending on whether e < | or e > |, using (|122|) we conclude that we must necessarily 
have (11861) . which is, in fact, a consequence of conditions ©, (|c]) in Section HI and dvj). Now, 
(11851) is simply the converse part of (11211) . 

The proof of the refined converse in (11871) is relegated to Appendix El ■ 

Theorem [21] shows that if the source and the channel are probabilistically matched in the 
sense of [fTTfl . then not only does symbol-by-symbol transmission achieve the minimum average 
distortion, but also the dispersion of JSCC. In other words, not only do such symbol-by- symbol 
codes attain the minimum average distortion but also the variance of distortions at the decoder's 
output is the minimum achievable among all codes operating at that average distortion. 

Two conspicuous examples that satisfy the probabilistic matching condition flvD, so that symbol- 
by-symbol coding is optimal in terms of average distortion, are the transmission of a binary 
equiprobable source over a binary-symmetric channel provided the desired bit error rate is 
equal to the crossover probability of the channel [19, Sec. 11. 8], [20, Problem 7.16], and the 
transmission of a Gaussian source over an additive white Gaussian noise channel under the 
mean-square error distortion criterion, provided that the tolerable source signal-to-noise ratio 
attainable by an estimator is equal to the signal-to-noise ratio at the output of the channel [12111 . 
We dissect these two examples next. 

B. Symbol-by-symbol coding for lossy transmission of BMS over BSC 

In the setup of Section [VH if the source is unbiased (p = |), then C = 1 — h(S), R(d) = 
1 — h(d), and D(C) = 5. If the encoder and the decoder are both identity mappings (uncoded 
transmission), the resulting joint distribution satisfies condition dyj). Using (11221) and (11831 ), it is 
easy to verify that 

W(l) = Wi = 6(1 - 5) (188) 

that is, uncoded transmission is optimal in terms of dispersion, as anticipated in (11861) . 

Moreover, regardless of the allowed e, uncoded transmission attains the minimum distortion 
D(n, n, e) achievable among all codes operating at blocklength n, as the following result demon- 
strates. 
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Theorem 22 (BMS-BSC, symbol-by-symbol code). At blocklength n and excess distortion 
probability e, the uncoded scheme achieves, regardless of < p < 1, S < |, 



d : f J min{p, 5} 1 



1 - min{p, 5}) n_t > 1 -e V (189) 



Moreover, if the source is equiprobable (p = |), 

i^i(n,e) = D(n,n,e) (190) 
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Proof: Direct calculation yields (11891) . To show (11901) , let us compare d* = Di(n,e) with 
the conditions imposed on d by Theorem [T21 Comparing (11891) to (|142|) . we see that either 
(a) equality in (11891) is achieved, r* = nd*, A = 0, and (plugging k = n into (|141|) ) 




(191) 



thereby implying that d > d*, or 
(b) r* = nd* — 1, A > 0, and (11411) becomes 

+ <192) 

which also implies d > d*. To see this, note that d < d* would imply \nd\ < nd* — 1 since 
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nd* is an integer, which in turn would require (according to (11921) ) that A < 0, which is 
impossible. 

■ 

For the transmission of the fair binary source over a BSC, Figure [7] shows the distortion 
achieved by the uncoded scheme and the separated scheme versus n for a fixed excess-distortion 



probability e = 0.01. Figure |8(a)| shows the rate achieved by separate coding when d > S is 
fixed, and the excess-distortion probability e, shown in Fig. |8(b)[ is set to be the one achieved by 
uncoded transmission, namely, (11891) . Figure [8(a)] highlights the fact that at short blocklengths 



(say n < 100) separate source/channel coding is vastly suboptimal. As the blocklength increases, 
the performance of the separated scheme approaches that of the no-coding scheme, but according 
to Theorem [22] it can never outperform it. Had we allowed the excess distortion probability to 
vanish sufficiently slowly, the JSCC curve would have approached the Shannon limit as n — > oo. 
However, in Figure |8(a)[ the exponential decay in e is such that there is indeed an asymptotic 



rate penalty as predicted in 0. 

For the biased binary source with p = | and BSC with crossover probability 0.11, Figured 
plots the maximum distortion achieved with probability 0.99 by the uncoded scheme, which in 
this case is asymptotically suboptimal. Nevertheless, uncoded transmission performs remarkably 
well in the displayed range of blocklengths, achieving the converse almost exactly at blocklengths 
less than 100, and outperforming the JSCC achievability result in Theorem [T3l at blocklengths as 
long as 900. This example substantiates that even in the absence of a probabilistic match between 
the source and the channel, symbol-by-symbol transmission, though asymptotically suboptimal, 
might outperform SSCC and even our random JSCC achievability bound in the finite blocklength 
regime. 

C. Symbol-by-symbol coding for lossy transmission of a GMS over an AWGN 
In the setup of Section IVIH using (11541) and (11551) . we find that 

D{C{P)) = (193) 

The next result characterizes the distribution of the distortion incurred by the symbol-by- symbol 
scheme that attains the minimum average distortion. 
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Theorem 23 (GMS-AWGN, symbol-by-symbol code). The following symbol-by-symbol trans- 
mission scheme in which the encoder and the decoder are the amplifiers: 

fi(s) = as, a 2 = ^ (194) 



gi(y) = 6y. b = 2TI 2 ( 195 ) 



15 an (n,d,e,P) symbol-by-symbol code (with average cost constraint) such that 



P [WgD(C(P)) > nd] = e (196) 
where Wq is chi-square distributed with n degrees of freedom. 

Note that (11961 ) is a particularization of (11791) . Using (11961) , we find that 

Wi(P) = 2 J D 2 (C(P)) log 2 e (197) 
On the other hand, using (|122|) . we compute 

W(l, P) = D 2 (C(P)) (2 - D\C{P))) log 2 e (198) 
which means that for a\ < 1 + P 

W 1 (P)>W(1,P) (199) 

The difference between (11991 ) and (11861) is due to the fact that the optimal symbol-by-symbol 
code in Theorem [23] obeys an average power constraint, rather than the more stringent maximal 
power constraint of Theorem |9l so it is not surprising that for e > | the symbol-by-symbol 
code outperforms the best code obeying the maximal power constraint. More interestingly, in 
the practically relevant case e < |, (11991 ) implies that the symbol-by- symbol code of Theorem 
[23] is suboptimal in terms of dispersion, even though it achieves the minimum average distortion. 
Nevertheless, in the range of blocklenghts displayed in Figure [TOl the symbol-by- symbol code 
even outperforms the converse for codes operating under a maximal power constraint. 

IX. Conclusion 

The approach taken in this paper to analyze the non-asymptotic fundamental limits of lossy 
joint source-channel coding is two-fold. Our new achievability and converse bounds apply to 
abstract sources and channels and allow for memory, while the asymptotic analysis of the new 
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bounds leading to the dispersion of JSCC is focused on the most basic scenario of transmitting 
a stationary memoryless source over a stationary memoryless channel. 
The major results and conclusions are the following. 

• A general new converse bound (Theorem [3]) leverages the concept of d -tilted information 
(Definition [5]), a random variable which corresponds (in a sense that can be formalized lfl2l . 
B221) to the number of bits required to represent a given source outcome within distortion 
d and whose role in lossy compression is on a par with that of information (in (fT6l) ) in 
lossless compression. 

• The converse result in Theorem @] capitalizes on two simple observations, namely, that any 
(d, e) lossy code can be converted to a list code with list error probability e, and that a 
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binary hypothesis test between Psxy an d an auxiliary distribution on the same space can 
be constructed by choosing Psxy when there is no list error. 

• As evidenced by our numerical results, the converse result in Theorem [5j which applies to 
those channels satisfying a certain symmetry condition and which is a consequence of the 
hypothesis testing converse in Theorem @] can outperform the d-tilted information converse 
in Theorem |3] Nevertheless, it is Theorem [3] that lends itself to analysis more easily and 
that leads to the JSCC dispersion for the general DMC. 

• Our random-coding -based achievability bound (Theorem [7]) provides insights into the degree 
of separation between the source and the channel coding required for optimal performance 
in the finite blocklength regime. More precisely, it reveals that the dispersion of JSCC can be 
achieved in the class of (M, d, e) JSCC codes (Definition [8]). As in separate source/channel 
coding, in (M, d, e) coding the inner channel coding block is connected to the outer source 
coding block by a noiseless link of capacity logM, but unlike SSCC, the channel (resp. 
source) code can be chosen based on the knowledge of the source (resp. channel). The 
conventional SSCC in which the source code is chosen without knowledge of the channel 
and the channel code is chosen without knowledge of the source, although known to achieve 
the asymptotic fundamental limit of joint source-channel coding under certain quite general 
conditions, is in general suboptimal in the finite blocklength regime. 

• For the transmission of a stationary memoryless source over a stationary memoryless 
channel, the Gaussian approximation in Theorem [9] provides a simple estimate of the 
maximal nonasymptotically achievable joint source-channel coding rate. Appealingly, the 
dispersion of joint source-channel coding decomposes into two terms, the channel dispersion 
and the source dispersion. Thus, only two channel attributes, the capacity and dispersion, 
and two source attributes, the rate-distortion and rate-dispersion functions, are required to 
compute the Gaussian approximation to the maximal JSCC rate. 

• In those curious cases when the source and the channel are probabilistically matched so 
that symbol-by-symbol coding attains the minimum possible average distortion, Theorem 
[2D ensures that it also attains the dispersion of joint source-channel coding, that is, symbol- 
by-symbol coding results in the minimum variance of distortions among all codes operating 
at that average distortion. 

• Even in the absence of a probabilistic match between the source and the channel, symbol-by- 
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symbol transmission, though asymptotically suboptimal, might outperform separate source- 
channel coding and joint source-channel random coding in the finite blocklength regime. 

Appendix A 
The Berry-Esseen theorem 

The following result is an important tool in the Gaussian approximation analysis. 

Theorem 24 (Berry-Esseen CLT, e.g. |[23l Ch. XVI. 5 Theorem 2]). Fix a positive integer n. Let 
Wi, i = 1 , . . . , n be independent. Then, for any real t 



P 



Wi > n I D n + 1 



i=i 




V n 



n 



-Q(t) 



< 



B„ 



n 



where 



1 n 

D n = -J2E[W i ] 

i=i 

1 n 

K = -VVar [Wi] 

n L — ' 

i=i 

1 n 

T n = -^E[|^,-E[^]| 3 ] 



B r . 



n 
i=i 

coT n 



V, 



3/2 



(200) 

(201) 

(202) 

(203) 
(204) 



and 0.4097 < c < 0.5600 (0.4097 < c < 0.4784 for identically distributed Wi). 



Appendix B 

Auxiliary result on the minimization of the information spectrum 

Given a finite set A, we say that x n E A n has type P x if the number of times each letter 
a G A is encountered in x n is nPx(a). Let V be the set of all distributions on A, which is 
simply the standard |^4| — 1 simplex in M) A ^. For an arbitrary subset V C V, denote by Pu the 
set of distributions in V that are also n-types, that is, 

V [n] = {Px G V: 3x n G A n : type(x n ) = P x } (205) 

Denote by n(Px) the minimum Euclidean distance approximation of Px in the set of n-types, 
that is, 



n(Px) = arg min 



Px 



(206) 
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Let V* be the set of capacity-achieving distributions: 

V* = {Px G V : /(X; Y) = C} (207) 
Denote the minimum (maximum) information variances achieved by the distributions in V* by 

(208) 
(209) 



Kiin = min Var [« X; y(X; Y)] 



V max = max Var [z X; y(X; Y)] 

and let P min C V* be the set of capacity-achieving distributions that achieve the minimum 
(maximum) information variance: 



Kin = {Px e P* : Var [z X;Y (X; Y)] = V min } 



(210) 



and analogously "P max for the distributions in V* with maximal variance. Lemma [TJ below allows 
to show that in the memoryless case, the infimum inside the expectation in d37l) is approximately 
attained by those sequences whose type is closest to the capacity-achieving distribution P x * (if it 
is non-unique, Px* is chosen appropriately based on the information variance it achieves). This 
technical result is the key to proving the converse part of Theorem [9] 

Lemma 1. There exist A > such that for all sufficiently large n: 
1) IfV-mi-n > 0, then there exists K > such that for |A| < A, 



min P 



J^zxjY^Yi) <n(C-A) 



8=1 



> P 



J> Xi Y«;Y)<n(C-A) 



i=l 



4= (2D 



where (121 II) holds for any x n * whose type is in "P min if A > 0, and for any x n * whose type 
isinV*ifA<0. 



2) If Knax = 0, then for all < a < § and A > 



min P 



i=i 



> 1 



I_3 a 
77,4 2" 



(212) 



The information densities in the left sides of (|21 1J) and (1212J) are computed with type(x n ) = 
Px — ► Py|x - ► Py> and that in the right side of (121 II) is computed with type(x n *) = Px — > 
Py\x Py- The independent random variables in the left sides of (121 II) and (|212l) have 
distribution Py\x=xi> while Y{ in the right side of (121 1|) have distribution Py\x= x \- 
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In order to prove Lemma^l we first show three auxiliary lemmas. The first two deal with 

□ 

approximate optimization of fu nctions. 

If / and g approximate each other, and the minimum of / is approximately attained at x, then 

g is also approximately minimized at x, as the following lemma formalizes. 

□ 

Lemma 2. Fix 77 > 0, £ > $T~Let V be an arbitrary set, and let f : V (->• R and g : V 1— > E be 
such that I I 



□ 
□ 



sup \f(x) - g(x)\ < T] 



g(x) < ming(y) + £ + 2ri 



as long as x satisfies 



(see Fig. [77]). 




(213) 



(214) 



(215) 



Fig. 11. An example where d214b holds with equality. 



Proof of Lemmas Let x* £ V be such that g(x*) = mm. y ^ g(y). Using (12131) and (12151) . 
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write 



g(x) < mm f(y)+g(x)-f(x)+£ (216) 
y ev 

<mmf(y)+r] + £ (217) 

<f(x*)+ V + Z (218) 

= g(x*)-g(x*) + f(x*)+7 1 + Z (219) 

< g(x*) + 2r/ + £ (220) 



The following lemma is reminiscent of [14, Lemma 64]. 

Lemma 3. Let T> be a compact metric space, and let d: T> 2 — )■ R + Z?e a metric. Fix f : T> i— »■ 
an J o : X> i— > R. Le? 



P*= iGP:/(i) = max/( 2/ ) (221) 
^ ye© J 

Suppose that for some constants £ > 0, L > 0, we have, for all (x, x*) G V x X)*, 

f{x*) -f{x) >£d 2 (x,x*) (222) 
- o(x) | < x*) (223) 

TTien, /or any positive scalar s cp, xjj, 

max [if fix) ± ^o(x)] < ^/(x*) ± ^g(x*) + ^ (224) 
Moreover, if, instead of (12221) . / satisfies 

f(x*)-f{x) >£d{x,x*) (225) 
then, for any positive scalars ip, if such that 

Li/; < tif (226) 

we have 

max [if fix) ± i>gix)\ < if fix*) ± (227) 
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Proof of Lemma \3\- Let x achieve the maximum on the left side of (12241) . Using (|2221> 
and (1223b . we have, for all x* E T>*, 

o < <p (/(s„) - /(**)) ± ^ {g{x ) - g{x*)) (228) 

< -£<p<P(x , x*) + L^jd{x , x*) (229) 
L 2 ib 2 

* w (230) 

where (1230b follows because the maximum of (12291) is achieved at d(xo,x*) = 
To show (12271) . observe using (1225b and (12231) that 

< <p (f(x ) - f(x*)) ± V (<?(z ) - <?«)) (231) 

< (-^ + L^)d(x ,x*) (232) 

< (233) 

where where (|233b follows from (1226b . 

■ 

The following lemma deals with asymptotic behavior of the Q-function. 

Lemma 4. Fix a > 0, b > 0. Then, there exists q > (explicitly computed in the proof) such 
that for all z > — ^ anJ a// n Zarge enough, 



Proof of Lemma 0' 
Q(s) is convex for x > 0, and Q (s) = — ^=e 2", so for x > 0, £ > 



Q(x + >Q{x)-^=e-^i (235) 

V 27T 



while for arbitrary x and £ > 0, 



Q{x + i)>Q{x)-^=i (236) 

V Z7T 
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If z > -7=, we use (12351) to obtain 



Q[z- 4= J - Q\ z+ ~^z 2 ) (237) 



n / \ \/n 



, 2 



1 V z vW / b 2 a 



'-W/^KTn^^ <238) 



te 2 ( z naQ a 

< ~7=e a + (239) 

y2ixn \f2im 

36e~5 + a 

V27rn 

where (12401) holds for n large enough because the maximum of (|239l) is attained at z = 



\/ 2 + 2n + 2^- 



If < z < 4j, we use (|236l) to obtain 



<2|z -^)-°(- + ^ 2 ) £ i(^ 2 + ^ 1 (241) 



V27m V n 

If -it < ^ < 0, we use Q(x) = 1 - to obtain 



< 1 + — (242) 



Q\\z\-^)-Q\\4 + ^\ (243) 



1 ^i 1 -^) 2 ( b 2 a , 
< -^=e 5 -^z 2 + -f= (244) 



27r V vn v n 



bz 2 ' V~^ , 

< -=e 3 + -== (245) 

bz 2 z 2 a _ 

< -j=e~ + -j= (246) 
y/2im y/2irn 

< Abe' 1 + a ^ 

where to justify (12461) . which holds for n large enough, observe that for such n the function in 
(12461) is monotonically decreasing for |z| > v3, so the maximum of (12461) is attained at some 
\z\ < y/3. But for such \z\, we may lower bound (l — ^\z\) > \ in (|245l) for sufficiently 
large n, and (12461) follows. ■ 
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We are now equipped to prove Lemma [TJ 

Proof of Lemma [/} 
Define the following functions V i-)- R + : 

J(Px) = /(X;Y)=E[* X; y(X;Y)] 
^(P x )=E[Var[zx ;Y (X;Y) | X]] 
T(P X ) = E [|z X;Y (X; Y) -E[* X; y(X; Y)|X]| 3 | X] 

If Px = type(x n ), then for each a E A, there are nPx(a) occurrences of Py|x=a 
{Py\x= Xi , i = 1, 2, ... , n}, and in the sequel will invoke Theorem [24] with W, 
where x n is a given sequence, and (120 1 1) — (12031) become 

\A\ 



(248) 
(249) 
(250) 

among the 



D„ = ~Y^ nP x (a)E [z X;Y (a; Y) | X = a] 

71 a=l 



I(Px 



1 

n 



\A\ 



J]nPx(a)Var[zx;Y(a;Y) |X 



a=l 



V(P X 



T = — 

-J- n. — 



\A\ 



£ nPx(a) |2 X;Y (a; Y) — E [z X;Y (a; Y) |X = a] f 



a=l 



(251) 
(252) 
(253) 
(254) 
(255) 



T(P X 



(256) 



Define the (Euclidean) ^-neighborhood of the set of capacity-achieving distributions P*, 



P X GP: min |P X -P X *| <S 



(257) 



We split the domain of the minimization in the right side of (121 II) into two sets, type(x n ) G 
Ps[n] an ^ type(;r n ) ^ V[ n ]\V^ (recall notation (12051 )), for an appropriately chosen 5 > 0. 

We now show that (121 II) holds for all A < Al if the minimization is restricted to types in 
V[ n ]\Vg, where 5 > is arbitrary, and 



A T — C - max J(P X ) > 

Ac6P[n]\T? 



(258) 
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By Chebyshev's inequality, for all x n whose type belongs to V[ n ]\Ps, 

P 



P 



< P 



^X;y(^) > n(C - A) 

i=i 

n 

^y(xu Yi) - nJ(Px) > n{C - I{P X )) - nA 



i=l 
n 



nAj 



i=l 



< P 

4nV(P x 



(n \ 2 2 A 2 

Y,^{x i ;Y i )-nI{P x )\ >^ 



< 



< 



n 2 A] 
4V 



nA] 

where in (12611 ) we used 
and 



A < -A/ < Aj < C - J(P X ) 



V = maxWP x ) 



Note that V < oo by Property [TJ below. Therefore, 



min P 

type(a:«)eP [n] y>J 



^z X; y(^;^) <n(C-A) 



i=l 



> 1 - 



4^ 



> P 



X>x ;Y (xM)<n(c-A) 



1=1 



AV 
nA] 



(259) 
(260) 
(261) 

(262) 

(263) 
(264) 

(265) 

(266) 

(267) 
(268) 
(269) 



We conclude that (|21 1|) holds if the minimization is restricted to types in V[ n ]\Pg. 

Without loss of generality, we assume that all outputs in B are accessible (which implies that 
P Y * (y) > for all y e B) and choose 8 > so that for all P x e V} and y e B, 



Py(y) > 



(270) 



We recall the following properties of the functions /(•), V(-) and T(-) from lfl4l Appendices E 
and I]. 
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Property 1. The functions I(-Px). V(Px) an d T(Py) are continuous on the compact set V, and 
therefore bounded and achieve their extrema. 

Property 2. There exists an l x > such that for all (Px*, Px) € V* x V\, 

C-I(P x )>£i\Px-PxA 2 (271) 



Property 3. In Vg, the functions f (fx), V(Px) an d T{Py) are infinitely differentiable. 

Property 4. In V\ V(P X ) = Var [i X; y(X;Y)]. 

Due to Property [31 there exist nonnegative constants L x and L 2 such that for all (P X) -fx*) G 
V\ x T 3 *, 

C-/(P x )<£i|P X -Px*| (272) 
V{Px)<L 2 \Px-PxA (273) 

To treat the case x n e 7-^ r,, we will need to choose 5 > carefully and to consider the cases 
Vmin > and V max = separately. 

A. V min > 0. 

We decrease 5 until 

V min < 2 min V (Px) {21 A) 

Knnn^, A\ (275) 



are satisfied, in addition to (|270l) . where V, L\, L 2 are defined in (12661) . (|272l) and (12951) , 
respectively. 

We now show that (|21 1|) holds if the minimization is restrained to types in Vg, n u for all 
A > —A, for an appropriately chosen A > 0. Using (12741) and boundedness of T(Px), write 

c T(Px) 2lc T 
B = max — 5 < ^ — < oo (276) 

p^n vl(p x ) ~ v\ 

* m i rt 
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where 



T = max T(Px) < oo 
Therefore, for any x n with type(x n ) 6 'P&fav me Berry-Esseen bound yields: 

B 



(277) 



P 



X/xjYte;^) <n(C-A) 



i=i 



Q(t(Px)) 



< 



n 



where 



nI(P x ) - nC + nA 



We now apply Lemma [2] with V = V\ ^ and 

f(P x ) = Q(v(P x )) 



p 



J>x ; y(^-) <n(C-A) 



i=i 



(278) 
(279) 

(280) 
(281) 



Condition (|213l) of Lemma [2] holds with 77 = -4= due to (12781) . As will be shown in the sequel, 
the following version of condition (12151) holds: 



QHn(P x *)))< min Q( u (P K )) + -^= 



(282) 



where U(Px), the minimum Euclidean distance approximation of Px in the set of n-types, is 
formally defined in (12061) . and q > will be chosen later. We proceed to apply Lemma [H and 
(12141) leads to 



min P 

type(a:«)eP* w 



> P 



J^xjYte;^) <n(C- A) 

.i=l 

n 

X)»x,Y«;yi)<n(C-A) 



i=l 



g + 2P 



(283) 



We conclude that (12111) holds if minimization is restrained to types in Vg, n y 

We proceed to show (12821) . As will be proven later, for appropriately chosen L > and L > 
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it holds that 



< max u(P x ) (285) 

< max v(Py) (286) 

J^A JUL 2 A 2 

< + 2 ( 287 ) 



where P x * e P* in if A > and P x * e P max if A < 0. 
Denote 



2 

V{Px*)L 2 
AK 
V^A 



(288) 
(289) 
(290) 



If 



M 

A > -777^= = "A (291) 

V •'max 

then 2; > and Lemma H applies to z. So, using (12841) . (|287l) . the fact that Q(-) is 

monotonically decreasing and Lemma HI we conclude that there exists a q > such that 

Q(i/(n(JV)))- nun Q(f(fl()) 



Q (v(n(P x *))) - Q max ^(P x ) (292) 
<q( z _^)_ Q ( z+ * A (293) 



< 4= (294) 

'n 



which is equivalent to (12821) . 

It remains to prove (12841) and (12871) . Using (12741) . observe that in V\, the gradient with respect 
to P x satisfies 

1 <L = ^ (295) 

V3„ 
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so recalling (12721) and denoting ( = |P X ~ P<*|> we have 

C - Px ~ A + 14(^ + 1*0 (296) 



A 



< ^== + LC (297) 



where 

T, = 



L = -j=L= + LA + PM (298) 



So, (12841 ) follows by observing 



|Px-n(P x )| < ^1 (299) 



To show (12871) . we apply Lemma [3] with 

P = P 5 * (300) 

P* = P* (301) 

9? = (302) 

^ = y/n\A\ (303) 

/(^x) = ^=^ (304) 



x 



glP ^7m <305) 

Let us verify that conditions of Lemma [3] hold. Function g satisfies condition (12231) with L 
defined in (I295K Let us now show that function / satisfies condition (12221) with i = — 

_ 2\JV 

where V and £i were defined in (12661) and (12711) . respectively. Write 

max /(P x ) - /(P x ) = C Z 7 ^ ) (306) 



Pxev 



= (C-/(P X ))<?(P X ) (307) 

> (C - /(Px)) (^max^(Px) - 2L5) (308) 

> (C-J(P X )) ^-L_2L^ (309) 
>(C-/(P X )) J_ (310) 

> -^|Px-Px*| 2 (311) 
2VT 
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where 

. (13081) follows from (1225b and (12571) : 
. (13091) uses notation (12661) : 
. (T3T01) follows from (12751) : 
. (13TT1) applies (127T1) . 

So, Lemma [3] applies to z^(Px) = <ff{Px) + sign(A)?/>g(P x ), resulting in (1287L thereby com- 
pleting the proof of (|283l) . 

Combining (12691) and (12831) . we conclude that (12111) holds for all A in the interval 

A, 



- < A < 



(312) 



B. Knax — 0. 

We choose 5 so that (|270l) is satisfied. The case type(x ra ) ^ was covere d m (1269k so 

we only need to consider minimization of the left side of (12121) over Vgr n y Fix a < §. If 

34L§ 1 



A < 



we have 



P 



< P 



< 



n 

J^v^y*) > n(C - A) 



i=l 



X] *x ; y(^; - n/(P x ) > n(C - J(P X )) + n|A| 



n(C-/(P x ) + |A| 



< 



L|P X -P X * 




n(£i 


Px 


-Px* 


2 + |A|) 2 



< 



< 



'1 + 3£i) 2 n|A|§ 



I_3 Q 
77,4 2" 

where 

• (13161 ) is by Chebyshev's inequality; 

. (13171) uses (127TT) . (12951) and V max = 0; 

• (13181) holds because the maximum of its left side is attained at |P X — Px*| 2 = j^. 



(313) 

(314) 

(315) 
(316) 
(317) 

(318) 
(319) 
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Appendix C 
Proof of the converse part of Theorem [9] 

Note that for the converse, restriction drv]) can be replaced by the following weaker one. 
(iv') The random variable js(S, d) has finite absolute third moment. 
To verify that (JTvl) implies (iv'), observe that by the concavity of the logarithm, 

< 3s (s, d) + \*d < A*E [d (s, Z*)] (320) 



so 



E [|j s (S, d) + X*d\ 3 ] < A* 3 E [d 3 (S, T 



(321) 



We now proceed to prove the converse by showing first that we can eliminate all rates exceeding 



k 



> 



C 



n ~ R(d) - 3r 



(322) 



for an arbitrary < r < More precisely, we show that the excess-distortion probability of 
any code having such rate converges to 1 as n — > oo, and therefore for any e < 1, there is an 
n such that for all n > no, no (k, n, d, e) code can exist for k, n satisfying (|322l) . 

We weaken (|20l) by fixing 7 = kr and choosing a particular output distribution, namely, 
Pyn = Pyn* = Py* x ... x P Y *. Due to dHJ) , P* k = P£ x . . . x P£, the d-tilted information 
single-letterizes, that is, for a.e. s k , 

k 

3sk (s k ,d) = J2js(s l ,d) (323) 

i=l 

so Theorem Q] implies that the parameters of every (k, n, d, e') code must satisfy 



e' > E 



min P 



J2^i,d) -J2^y*(^ Y i) >kr\S k 



i=l 



3=1 



exp (— kr) 



> min P 



> min P 

x n £A n 



^2 «X;Y*(^i! Yi) <nC + kr 
.i=i 



^2 «X;Y*(^i5 Y t ) <nC + fir' 
.i=i 



P 



^2js{Si,d) > nC + 2kr 



8=1 



P 



^Js(^,rf)> kR(d)-kr 



i=l 



(324) 

— exp (—kr) 

(325) 
exp (—kr) 
(326) 
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where in (13261 ), we used (13221 ) and t' 



Ct 



r ,q_ 3t > 0. Recalling (PT21) and 

E[2 X;Y *(x;Y)|X = x]<C 



(327) 



with equality for Px*-a.e. x, we conclude using the law of large numbers that (13261) tends to 1 

as k, n — > oo. 

We proceed to show that for all large enough k, n, if there is a sequence of (k, n, d, e') codes 
such that 



-3kr <nC - fcP(d) 



(328) 
(329) 



< y/nV + kV{d)Q- 1 (e) + 9 (n) 
then e' > e. 

Note that in general the bound in Theorem Q] does not lead to the correct channel dispersion 
term. We first consider the general case, in which we apply Theorem [31 and then we show the 
symmetric case, in which we apply Theorem [2] 

Recall that x n G A n has type Px if the number of times each letter a E A is encountered in 
x n is nPx(a). In Theorem [3j let t index the types Px of sequences in X = A n . Note that the 
total number of types satisfies [20J T < [n + l)'- 4 ' -1 . We will weaken the outer supremum in 
(1371) by fixing a particular collection of output distributions, namely, Py« = Py x . . . x Py, 

^ type(3; n ) 

where P x — > Py\x — > Py, i-e- Py is the output distribution induced by Px = type(x n ). In this 
way, Theorem [3] implies that every (k,n,d,e') code must satisfy 



e' > E 



min P 



i=l 



i=l 



( n+ exp (_ T ) 



Choose 



7 



\A\--) log(n + 1) 



(330) 



(331) 



At this point we consider two cases separately, V > and V = 0. 



A. V > 0. 

In order to apply Lemma Q] in Appendix [Bj we isolate the typical set of source sequences: 



7k, n 



^2js(si,d) -nC-7 
i=i 



< nA 



(332) 
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Observe that 



p [s k i %, n ] = p 



^2js(Si,d) - nC 



< P 



< P 



< 



i=i 

k 



> nA — 7 



J2js(S l ,d)-kR(d) 



i=l 
k 



J2js(S l ,d)-kR(d) 



i=l 



> k 



nC - kR(d) \ +7 > nA 
AR{d) 



2C 



AC 2 V(d) 



R 2 (d)A 2 k 
where 

• (13351) follows by lower bounding 



(333) 

(334) 

(335) 
(336) 



nA - 7 - \nC - kR(d)\ > nA - 7 - 3kr 

kA 



> — (R(d) - 3r) - 7 - 3kr 



> k 



AR(d) 
2C 



(337) 
(338) 
(339) 



where 



- (13371) holds for large enough n due to (13281) and (13291) : 

- (13381) lower bounds n using (13281) : 

- (13391) holds for a small enough r > 0. 
• (13361) is by Chebyshev's inequality. 

Now, we let 

B 1 AC 2 V(d) 

e fc>n = e + + ^_ + p2 ,,, A2 (340) 

y/n + k yjn + 1 R 2 {d)A 2 k 

where £> > will be chosen in the sequel, and k, n are chosen so that both (13281) and the 
following version of (13291 ) hold: 



nC - kR(d) <\\nV + kV(d) - j/ 1 ^ Q' 1 (e fc ,„) - 7 



(341) 



2(n + fc) 

where L < 00 is the maximum absolute value of the gradient of the function V(-) (defined in 
(12491 )) over the set P/ (in (I257D ). Denote for brevity 

r{x n , y n , s n ) = «X;Y(x i; yi ) - ^ j s (^, d) (342) 



i=i 



i=i 
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Weakening (13301) using (13311) and Lemma [fl we have 



e' > E 



> E 



min P [r(x n , F n , S n ) < -7 I S fc l • 1 \S k E % n \ 

,nrz An <- 1 J I > J 



1 



Vn + 1 



p [r(x n *, r n , s n ) < -7 I s fc ] ■ 1 g r*,„} 

= P [r(x n *, Y n , S n ) < -7, S k e %, n ] - k 

> P [r{x n \ r, S n ) < -7] - p [s k i T k , n ] 

> P [r{x n \ Y n , S n ) < -7] 

> e 



K 



n -s/n + 1 



1 



a/w + 1 

K 1 



AC 2 V(d) K 1 



R 2 (d)A 2 k ^n~ v^TT 



(343) 

(344) 

(345) 
(346) 
(347) 
(348) 



where (13441) is by Lemma [Q and (13461) is by the union bound. To justify (13481) . observe that the 
quantities in Theorem [24] corresponding to the sum of independent random variables in (13471) 
are 



D 



n+k 



< 



> 



T 



n+k 



n 


n 


+ k 




n 


n 


+ k 




n 


n 


+ k 




n 


n 


+ k 




n 


n 


+ k 



/(n(PxO) 



n + k 



C 



k 



-R(d) 



V(U(P X ,)) + 



n + k 



V + 



-V(d) 



R(d) 

V{d) 
2(n + k) 



k 



T(n(PxO) + [|js(S, d) - R(d)\ 3 ] 



(349) 
(350) 
(351) 

(352) 
(353) 



where the functions ),!(•), V(-), T(-) are defined in (12061) . (l248l)-(f250T) in Appendix E To 
show (13521) , recall that V(Px*) = V by Property |4] in Appendix |B] and use (12991) and Lipschitz 
continuity of V(Py) (Property [3] in Appendix |B]). Further, T n+k is bounded uniformly in P x > so 
(12041) is upper bounded by some constant B > 0. Finally, using (|350l) and (13521) in (13681) . we 
conclude that 

- 7 > {n + k)D n+k - yfo + k)V n+k Q- 1 (e fc , n ) (354) 



which enables us to lower bound the probability in (|347l) invoking the Berry-Esseen bound 
(Theorem [24]) . In view of (13401) . the resulting bound is equal to e, and the proof of (13481) is 
complete. 
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B. V = 0. 



Fix an < a < ~. 



If V(d) > 0, we choose 7 as in (13311 ), and 



B , 
efc,„ = e + -j= + [n 



+ l)l- 4 l- 1 exp(- 7 ) + ^i T 



77,4 2 C 



(355) 



where £> > is the same as in (13401) . and k, n are chosen so that both (13281) and the following 
version of (13291 ) hold: 



nC - kR(d) < ^/kV(d)Q- 1 (e fc ,„) - 7 - An^ a 
where A > was defined in Lemma Q] Weakening (13301) using (12121) . we have 



(356) 



e > P 



> P 



> e 



nC < ^jsC^d) -7- An?' 



«4 2 ( 



1=1 



( n + l)W-iexp(- 7 ) 
1 



I_3 ( 
«4 2 C 



(357) 

n + l) | - 4| - 1 exp(- 7 ) (358) 

(359) 



where (13581) uses (13561) . and (13591) is by the Berry-Esseen bound. 
If V(d) = 0, which implies js(Si, d) = R(d) a.s., we let 



7 = (|^| - 1) log(n + 1) - log (1 - e j 

and choose fc, n that satisfy (13281) and 

kR{d) - nC > 7 + An5~' 
Then, plugging js(Si,d) = R(d) a.s. in (13301) . we have 



1 



77,4 2 ( 



e' > min 



i=i 



> P 
= 1 



nC < kR(d) - 7 - A«2" 
1 



- (n + l) | - 4| - 1 exp(- 7 ) 
--(n + l) | - 4| - 1 exp(- 7 ) 



( n + 1)1-^1-1 exp (-7) 



(360) 
(361) 

(362) 

(363) 
(364) 
(365) 



where (13631) invokes (12121) . (|364l) is by the choice of n, and (13651) follows from the choice of 



7- 
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C. Symmetric channel. 

We show that if the channel is such that the distribution of «x,y*( x ; Y*) does not depend on 
the choice x 6 A, Theorem |2] leads to a tighter third-order term than (|1 1 II) . 
If either V > or V(d) > 0, let 



7 = 2 lo § n 

B 1 

Cfc,n — e H ; + 



(366) 
(367) 



y/n + k \fn 

where B > can be chosen as in (13401) . and let k, n be such that the following version of (13291) 
(with the remainder 6{n) satisfying (|1 1 1 1) with c = |) holds: 

- fci?(d) < v/nF + fcV(d)Q _1 (e fe , n ) - 7 (368) 

Theorem[2]and Theorem l24l imply that any (fc, n, d, e') code must satisfy, for an arbitrary sequence 



x n e A n , 



e' > P 



> e 



^JS^d) -J^«X;Y*(^;>i) > 7 
.i=l J=l 



- exp (-7) 



If both V = and V(c?) = 0, choose k, n satisfy 

kR(d) - nC > 7 



log 



1 - e 



(369) 
(370) 

(371) 
(372) 



Substituting (13721) and js{Si,d) = R(d), ix-y*(xi;Yi) = C a.s. in (13691) , we conclude that the 
right side of (13691 ) equals e, so e' > e whenever a [k,n,d,e') code exists. 



Z). Gaussian channel 

In view of Remark it suffices to consider the equal power constraint (I125I ). The spherically- 
symmetric Pyn = P Y n* = Py* x ... x P Y * , where Y* ~ A/"(0, £7^(1 + P)), satisfies the symmetry 
assumption of Theorem |2] In fact, for all x n G ^(a), n ; Y n ) has the same distribution 

under P~Y n I X' n =x n (cf. (fT58T» 
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e < E 


exp 


f 









where Wi ~ M yjpi lj > independent of each other. Since G n is a sum of i.i.d. random variables, 
the mean of ^ is equal to C = \ log (1 + P) and its variance is equal to (11101) . the result follows 
analogously to (I366D - (I370D . 

Appendix D 

Proof of the achievability part of Theorem [9] 
A. Almost lossless coding (d = 0) over a DMC. 

The proof consists of an asymptotic analysis of the bound in Theorem [8] by means of Theorem 
l24l Weakening (11051) by fixing P x ™ = -P£n = -Px* x . . . x P x * , we conclude that there exist a 
(k, n, 0, e') code with 

5>X ;Y (*M*)-E^) I < 374 > 

i=l i=l 

where (S k , X n * ,Y n *) are distributed according to P S kPx^*Py n \x n - The case of equiprobable S 
has been tackled in |[T4|| . Here we assume that «s(S) is not a constant, that is, Var ^s(S)] > 0. 
Let k and n be such that 

nC - kH{S) > VnV + kVQ' 1 ( e - f + 1 ) + - log(n + A;) (375) 

V y/n + kj 2 

where V = Var [zs(S)], and £> is the Berry-Esseen ratio (12041) for the sum of n + k independent 
random variables appearing in the right side of (13741) . Note that B is finite due to: 
. Var[* s (S)] > 0; 

• the third absolute moment of «s(S) is finite; 

• the third absolute moment of z x . Y (X*; Y*) is finite, as observed in Appendix [B] 
Therefore, (13751) can be written as of (11061) with the remainder therein satisfying (|1 171) . So, it 
suffices to prove that if k, n satisfy (13751) . the right side of (13741) is upper bounded by e. Let 

%, n ={(s k ,x n ,y n )eS k xA n xB n : 

*x ; y fol 2/,) - «s(si) > nC - kH(S) - VnV + kVQ' 1 ( e - } 

(376) 



By the Berry-Esseen bound (Theorem | 

F[(S k ,X n \Y n *) £T k , n ] <e 



\/n + k 



(377) 
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We now further upper bound (13741 ) as 



e < E 



exp 



»=i 



< exp ^- - kH(S) - VnV + kVQ' 1 



B + l 

y/n + k 



+ e- 



< e 



where (13801) follows from (1575b . 



(378) 



(379) 
(380) 



5. Lossy coding over a DMC. 

The proof consists of the asymptotic analysis of the bound in Theorem |7J using Theorem 
l24l and Lemma \5\ below, which deals with asymptotic behavior of distortion <i-balls. Note that 
Lemma [5] is the only step that requires finiteness of the ninth absolute moment of d(S,Z*) as 
required by restriction (Irvl) . 

Lemma 5 ( |[T2l Lemma 2]). Under restrictions (HTb — (|TvT) . there exist constants n , c,K>0 such 
that for all n > n , 



log 



1 



P zk *(B d (S k )) 



< ^2js(S h d) + chgk + 



i=i 



> 1 



K 



(381) 



where c — c — §, with c given by (II 14|) . 



We weaken (1781) by fixing 



Px« — Px«* 

Pgk — Pgk* 



Px*x- 

P Z * X . . 



. . X P x * 
. xP z * 



logM = kR(d) + 2kA 



7 = 7j lo Se fc 



(382) 
(383) 
(384) 
(385) 
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where A > 0, so there exists a (k, n, d, e') code with 



e' < < E 



exp 



E4 ;Y (^;^)-iog^f 



d {S k )) 



+ E 



e-i-{l-P zk *{B d {S k ))) 



M 



E 



(1 - P zk *{B d {S k ))) 



M 



(386) 



where (S k ,X n *,Y n *, Z k *) are distributed according to P S kP x ^*Py^\x n Pz k *- We need to show 
that for k, n satisfying (11061 ), the right side of (13861) is bounded by e. We begin by showing that 
the last two terms in (13861) are negligible. Denote 



U k = log M - ^ 3s(si, d) - c log k - c 



i=l 



where c and c are defined in Lemma [5J The second term in (13901) is bounded as, 



E 



- (1 - P zk ,{B d {S k ))) 



M 



< e 



Vk, 



and the third term as, 



E 



(1 - P zk ,{B d {S k ))) 



M 



< E 



-MP zhi ,{B d {S k )) 



K 



< e r e - ex p(^)i + _ 



= E 

+ E 
K 



c -«p(tr fc)1 [ Uh<]Dg ^b. 



+ 



Vk 



< P 



U k < lot 



log e /c 



Vk 



U k > log — — 



+ 



K 



< P 

< P 
1 



£4 < lot 



log e /c 



+ 



1 



^2js(Si,d) > kR(d) + kA 



i=i 



+ 



K + l 

Vk 



< 



V(d) 



+ 



fc (i2(d) + A) 2 



(387) 

(388) 

(389) 
(390) 



(391) 



(392) 
(393) 

(394) 

(395) 
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where 

. (13901) is by Lemma [5j 

• (13921) upper bounds e~ e ^ Uk ^ by 1 and 4^, respectively; 

• (13941 ) holds for large enough k by the choice of M in (13841) ; 

• (13951 ) is by Chebyshev's inequality. 

Note that the reasoning leading up to (13921) follows that in 021 (107)-(1 10)]. 
The first term in (|386l) is upper-bounded using Lemma [5] as: 

±^;:Y-)-^ g — lHM 

i=l ' 



E 



exp < - 



P zk *(B d (S k )) 



< E [exp{-|f4, n | + }] + 



K 



with 



(396) 
(397) 

(398) 



Uk,n = J2 l h( X t; Y *) ~ S)js(5i,d) - clog* - log ( 7 # M ) -c 
i=i i=i 

We first consider the (nontrivial) case when either V{d) + V > 0. Let k and n be such that 

nC - kR(d) > y/nV + fcV(d)Q _1 (e M ) + f c + log fc + log 7 # A / + c (399) 

V(d) + 3 



5 



1 



^k,n 



(400) 



v^+I fc + A) 2 ^ 
where constants c and c are defined in Lemma [51 and B is the Berry-Esseen ratio (12041) for the 

sum of n + k independent random variables appearing in (13971) . Note that B is finite because: 

• either V(d) > or V > by the assumption; 

• the third absolute moment of js(S, d) is finite by restriction dly]) as spelled out in (13211) ; 

• the third absolute moment of Zx- Y (X*;Y*) is finite, as observed in Appendix [Bl 
Since 

H M = hg e M + 0(l) (401) 

applying a Taylor series expansion to (13991 ) with the choice of M and 7 in (13841) and (13851) . we 
conclude that (13991) can be written as (11061) with the remainder term satisfying (| 1 1 2|) . 
It remains to further upper bound (13971) using (13991) . Let 

%, n ={(s k ,x n ,y n )eS k xA n xB n : 

n k ^ 

^ d)>nC- kR{d) - ^nV + kV{d)Q~ l (e k , n ) \ (402) 



2^ *X;Y( X j) Vi) 
i=l 



i=l 
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By the Berry-Esseen bound (Theorem 

¥ [(S k , X n \ Y n *) i %, n ] < e k>n + - ? £== (403) 

V n + k 

so the expectation in the right side of (13971 ) is upper-bounded as 

E[exp{-|f4 in | + }] 

< E[exp(-|f/ M | + l{(^,X- Y n *) e%, n })] +F[(S k ,X n *,Y nk ) { %, n ] (404) 

< P [(^ y^) e r fcjn ] + e M + -^L= (405) 

where we used (13991 ) and (14021) to upper bound the exponent in the right side of (|404l) . 

Assembling (13881) . (13951) . (13971) and (1405b . we conclude that the right side of (1386b is indeed 
upper bounded by e. 

Finally, consider the case V = V(d) = 0, which implies js(S, d) = R(d) and i^. y (X*; Y*) = C 
almost surely, and let k and n be such that 

nC - kR{d) >c\ogk + log 7# M + c + log \^ (406) 

e 7F 

where constants c and c are defined in Lemma [51 Then 

E [exp {- \U k X}} < e " (407) 
which, together with (1388b . (1395b and (1397b implies that e' < e, as desired. 

C. Lossy or almost lossless coding over a Gaussian channel 

In view of Remark |9l it suffices to consider the equal power constraint (1125b . As shown in 
the proof of Theorem [FT} for any distribution of X n on the power sphere, 

t X n ;Y n (X n ; Y n ) >G n -F (408) 

where G n is defined in (1373b (cf. (1158b ) and F is a (computable) constant. 

Now, the proof for almost lossless coding can be modified to work for the Gaussian channel 
by adding log F to the right side of (1375b and replacing Y%=i % x-,y ( X t'i Y i) in (| 374b and (1378b 
with G n - logF, and in (1376b with G n . 

The proof for lossy coding in Appendix ID-Bl is adapted for the Gaussian channel by replacing 
a11 Efc=i 4y i x t\ Y i) witn G n, and all H M (except (I40T1) ) with HmF. 
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Appendix E 
Proof of Theorem 

Applying the Berry-Esseen bound to (11791) , we obtain 

Di(n, e, a) 



. , ^ , /Var [d(S,Z)l , 
> min ^E[d(S,Z)] + W [ ^ ;J g~ ] 

/(S;Z)<C(a) 




D(C(a)) + X I^Q-i(e+j=) (409) 



where P is the Berry-Esseen ratio, and (14091 ) follows by the application of Lemma |3] with 
V = {P sz = Pz\sPs - ^(S;Z) < C(ot)}, = 1, yj 2 = Note that E [d(S, Z)] is a linear 
function of Psz and A/Var [d(S, Z)] is a continuously differentiable function of Psz, so conditions 
(12231) and (12251) hold with the metric being the usual Euclidean distance between vectors in 
Kj-s|x|S|_ So, (14091 ) follows immediately upon observing that by the definition of the rate-distortion 
function, E [d(S, Z)] > E [d(S, Z*)] = D(C(a)) for all P Z |s such that J(S; Z) < C(a). 
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